Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 01 Jan 2020 English Publisher:Walter de Gruyter GmbHJournal:Applied Mathematics and Nonlinear Sciences, volume 5, pages 1-10 (eissn: 2444-8656,

Copyright policy )

Authors: Xie, Ting; Liu, Ruihua; Wei, Zhengyuan;

doi: 10.2478/amns.2020.1.00001

Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data

- Summary
- Subjects
- Metrics

Abstract

Abstract Clustering as a fundamental unsupervised learning is considered an important method of data analysis, and K-means is demonstrably the most popular clustering algorithm. In this paper, we consider clustering on feature space to solve the low efficiency caused in the Big Data clustering by K-means. Different from the traditional methods, the algorithm guaranteed the consistency of the clustering accuracy before and after descending dimension, accelerated K-means when the clustering centeres and distance functions satisfy certain conditions, completely matched in the preprocessing step and clustering step, and improved the efficiency and accuracy. Experimental results have demonstrated the effectiveness of the proposed algorithm.

Related Organizations

Chongqing University of Technology
China (People's Republic of)

Keywords

Statistical aspects of big data and data science, Classification and discrimination; cluster analysis (statistical aspects), big data, \(K\)-means, feature space, clustering

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	54
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%