Improved Optimization of Canopy-Kmeans Clustering Algorithm Based on Hadoop Platform

Name: Improved Optimization of Canopy-Kmeans Clustering Algorithm Based on Hadoop Platform
Creator: Gongjian Zhou
Keywords: 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology

Gongjian Zhou

Found an issue? Give us feedback

https://doi.org/10.1...arrow_drop_down

https://doi.org/10.1145/314845...

Article . 2018 . Peer-reviewed

License: https://www.acm.org/publications/policies/copyright_policy#Background

Data sources: Crossref

https://dx.doi.org/10.1145/314...

Article

Data sources: Microsoft Academic Graph

Improved Optimization of Canopy-Kmeans Clustering Algorithm Based on Hadoop Platform

descriptionPublicationkeyboard_double_arrow_right Article 07 Dec 2018Publisher:ACMJournal:Proceedings of the International Conference on Information Technology and Electrical Engineering 2018

Authors: Gongjian Zhou;

doi: 10.1145/3148453.3306258

Improved Optimization of Canopy-Kmeans Clustering Algorithm Based on Hadoop Platform

- Summary
- Metrics

Abstract

How to apply clustering algorithm to effectively cluster large-scale data is an important research topic in data mining. Based on an in-depth analysis of the Hadoop platform architecture and Canopy-kmeans clustering algorithm, the Canopy-kmeans algorithm was optimized and parallelized. The data packets are clustered after grouping and sampling by statistical thinking to facilitate parallelization and reduce time complexity. The Canopy initial center point selection was optimized using the minimum-maximum principle, and data outlier average sampling method was used to ensure the uniform extraction of data samples from the original data, and the k-means iterative calculation process was optimized. Combined with the MapReduce framework under the Hadoop platform, the improved algorithm is designed and implemented in parallel. Experiments show that the improved Canopy-Kmeans parallel algorithm is effective and convergent when clustering massive amounts of numerical data, and it has a certain degree of improvement in the clustering accuracy and timeliness.

Related Organizations

Xiamen University
China (People's Republic of)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now