k-Means Has Polynomial Smoothed Complexity

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Oct 2009Embargo end date: 01 Jan 2009 Netherlands Publisher:IEEEJournal:2009 50th Annual IEEE Symposium on Foundations of Computer Science

Authors: David Arthur; Bodo Manthey; Heiko Röglin;

doi: 10.1109/focs.2009.14 , 10.48550/arxiv.0904.1113

arXiv: 0904.1113

k-Means Has Polynomial Smoothed Complexity

- Summary
- Subjects
- Metrics

Abstract

The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this paper, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/σ, where σis the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set.

Full version of FOCS 2009 paper. The argument has been improved and the restriction to at least three dimensions could be dropped

Country

Netherlands

Related Organizations

Maastricht University
Netherlands
University of Twente
Netherlands
Stanford University Dept. of Computer Sciences
United States
Stanford University
United States

Keywords

Probabilistic analysis, k-Means, Computational Geometry (cs.CG), FOS: Computer and information sciences, Smoothed analyis, I.5.3, k-Means clustering, Computational Complexity (cs.CC), Clustering, Computer Science - Computational Complexity, H.3.3, 2023 OA procedure, Computer Science - Data Structures and Algorithms, k-Means method, Computer Science - Computational Geometry, Data Structures and Algorithms (cs.DS), F.2.2, F.2.2; I.5.3; H.3.3

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	53
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%