A comprehensive validity index for clustering

descriptionPublicationkeyboard_double_arrow_right Article 17 Dec 2008 Singapore Publisher:SAGE PublicationsJournal:Intelligent Data Analysis, volume 12, pages 529-548 (issn: 1088-467X, eissn: 1571-4128,

Copyright policy )

Authors: Saitta, S.; Raphael, B.; Smith, I.F.C.;

doi: 10.3233/ida-2008-12602

A comprehensive validity index for clustering

- Summary
- Subjects
- Metrics

Abstract

Cluster validity indices are used for both estimating the quality of a clustering algorithm and for determining the correct number of clusters in data. Even though several indices exist in the literature, most of them are only relevant for data sets that contain at least two clusters. This paper introduces a new bounded index for cluster validity called the score function (SF), a double exponential expression that is based on a ratio of standard cluster parameters. Several artificial and real-life data sets are used to evaluate the performance of the score function. These data sets contain a range of features and patterns such as unbalanced, overlapped and noisy clusters. In addition, cases involving sub-clusters and perfect clusters are tested. The score function is tested against six previously proposed validity indices. In the case of hyper-spheroidal clusters, the index proposed in this paper is found to be always as good or better than these indices. In addition, it is shown to work well on multidimensional and noisy data sets. One of its advantages is the ability to handle single cluster case and sub-cluster hierarchies.

Country

Singapore

Related Organizations

École Polytechnique Fédérale de Lausanne EPFL
Switzerland
National University of Singapore
Singapore
National University of Singapore Libraries
Singapore

Keywords

Artificial Intelligence, Validity index, Computer Vision and Pattern Recognition, K-means, Clustering, Number of clusters, Theoretical Computer Science

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	44
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

44

Top 10%

Average

bronze

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering