The Effectiveness of Lloyd-Type Methods for the k-Means Problem

descriptionPublicationkeyboard_double_arrow_right Article , Conference object , Part of book or chapter of book 01 Jan 2006 United States Publisher:IEEEJournal:2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)Funded by:NSERC | unidentified

Authors: Rafail Ostrovsky; Yuval Rabani; Leonard J. Schulman; Chaitanya Swamy;

doi: 10.1109/focs.2006.75 , 10.1145/2395116.2395117

The Effectiveness of Lloyd-Type Methods for the k-Means Problem

- Summary
- Subjects
- Metrics

Abstract

We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify aclusterabilitycriterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for beingfaster in practicethan currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.

Country

United States

Related Organizations

California Institute of Technology
United States
University of Waterloo
Canada
University of California, Los Angeles
United States
Hebrew University of Jerusalem
Israel
Technion – Israel Institute of Technology
Israel

Keywords

000, Randomized algorithms, approximation algorithms, 004

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	198
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%