Fast approximate k-means via cluster closures

descriptionPublicationkeyboard_double_arrow_right Article , Part of book or chapter of book , Preprint , Conference object 01 Jun 2012Embargo end date: 01 Jan 2013Publisher:IEEEJournal:2012 IEEE Conference on Computer Vision and Pattern Recognition

Authors: Jing Wang 0068; Jingdong Wang 0001; Qifa Ke; Gang Zeng; Shipeng Li 0001;

doi: 10.1109/cvpr.2012.6248034 , 10.1007/978-3-319-14998-1_17 , 10.48550/arxiv.1312.3061

arXiv: 1312.3061

Fast approximate k-means via cluster closures

- Summary
- Subjects
- Metrics

Abstract

$K$-means, a simple and effective clustering algorithm, is one of the most widely used algorithms in multimedia and computer vision community. Traditional $k$-means is an iterative algorithm---in each iteration new cluster centers are computed and each data point is re-assigned to its nearest center. The cluster re-assignment step becomes prohibitively expensive when the number of data points and cluster centers are large. In this paper, we propose a novel approximate $k$-means algorithm to greatly reduce the computational complexity in the assignment step. Our approach is motivated by the observation that most active points changing their cluster assignments at each iteration are located on or near cluster boundaries. The idea is to efficiently identify those active points by pre-assembling the data into groups of neighboring points using multiple random spatial partition trees, and to use the neighborhood information to construct a closure for each cluster, in such a way only a small number of cluster candidates need to be considered when assigning a data point to its nearest cluster. Using complexity analysis, image data clustering, and applications to image retrieval, we show that our approach out-performs state-of-the-art approximate $k$-means algorithms in terms of clustering quality and efficiency.

Related Organizations

Peking University
China (People's Republic of)
Peking University
China (People's Republic of)
Microsoft (United States)
United States
Microsoft Research Asia (China)
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	57
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

57

Top 10%

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering