To cluster, or not to cluster: An analysis of clusterability methods

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Apr 2019Embargo end date: 01 Jan 2018 English Publisher:Elsevier BVJournal:Pattern Recognition, volume 88, pages 13-26 (issn: 0031-3203,

Copyright policy )

Authors: Andreas Adolfsson; Margareta Ackerman; Naomi C. Brownstein;

doi: 10.1016/j.patcog.2018.10.026 , 10.48550/arxiv.1808.08317

arXiv: 1808.08317

To cluster, or not to cluster: An analysis of clusterability methods

- Summary
- Subjects
- Metrics

Abstract

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. However, methods for evaluating clusterability vary radically, making it challenging to select a suitable measure. In this paper, we perform an extensive comparison of measures of clusterability and provide guidelines that clustering users can reference to select suitable measures for their applications.

30 pages, 3 figures, 10 tables

Related Organizations

Florida State University
United States
Santa Clara University
United States

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	204
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 0.1%

Found an issue? Give us feedback

204

Top 1%

Top 0.1%

Green

bronze

Fields of Science (4) View all

Fields of Science