A Fast and Efficient Grid-Based K-means++ Clustering Algorithm for Large-Scale Datasets

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Article 25 Dec 2018 English Publisher:Springer International Publishing

Authors: Yang Yang; Zhixiang Zhu;

doi: 10.1007/978-3-030-03766-6_57

A Fast and Efficient Grid-Based K-means++ Clustering Algorithm for Large-Scale Datasets

- Summary
- Metrics

Abstract

In the k-means clustering algorithm, the selection of the initial clustering center affects the clustering efficiency. Currently widely used k-means++ can effectively improve the speed and accuracy of k-means. But k-means cluster algorithm does not scale well to massive datasets, as it needs to traverse the data set multiple times. In this paper, based on k-means++ clustering algorithm and grid clustering algorithm, a fast and efficient grid-based k-means++ clustering algorithm was proposed, which can efficiently process large-scale data. First, the N-dimensional space is granulated into disjoint rectangular grid cells. Then, the dense grid cell is marked by statistical gird cell information. Finally, the modified k-means++ clustering algorithm is applied to the meshed datasets. The experimental results on the simulation dataset show that compared with the original k-means++ clustering algorithm, the proposed algorithm can quickly obtain the clustering center and can effectively deal with the clustering problem of large-scale datasets.

Related Organizations

Xi’an University of Posts and Telecommunications
China (People's Republic of)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

4

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now