Bayesian cluster validation

descriptionPublicationkeyboard_double_arrow_right Other literature type , Thesis 01 Jan 2008 United States, Mexico, Canada, Canada English Publisher:University of British Columbia

Authors: Koepke, Hoyt Adam;

doi: 10.14288/1.0051357

handle: 2429/1496

Bayesian cluster validation

- Summary
- Subjects
- Metrics

Abstract

We propose a novel framework based on Bayesian principles for validating clusterings and present efficient algorithms for use with centroid or exemplar based clustering solutions. Our framework treats the data as fixed and introduces perturbations into the clustering procedure. In our algorithms, we scale the distances between points by a random variable whose distribution is tuned against a baseline null dataset. The random variable is integrated out, yielding a soft assignment matrix that gives the behavior under perturbation of the points relative to each of the clusters. From this soft assignment matrix, we are able to visualize inter-cluster behavior, rank clusters, and give a scalar index of the the clustering stability. In a large test on synthetic data, our method matches or outperforms other leading methods at predicting the correct number of clusters. We also present a theoretical analysis of our approach, which suggests that it is useful for high dimensional data.

Countries

United States, Mexico, Canada, Canada

Related Organizations

University of British Columbia
Canada
Natural Sciences and Engineering Research Council of Canada
Canada
National Science Foundation
United States
Mexico's National Council for Science and Technology
Mexico

Keywords

Cluster validation, 006, Unsupervised learning, Clustering

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green