
handle: 20.500.14243/461867 , 20.500.11770/354142
This paper proposes a variation of the K-Means clustering algorithm, named Population-Based K-Means (PBK-MEANS), which founds its behaviour on careful seeding. The new K-Means algorithm rests on a greedy version of the K-Means++ seeding procedure (g_kmeans++), which proves effective in the search for an accurate clustering solution. PB-K-MEANS first builds a population of candidate solutions by independent runs of K-Means with g_kmeans++. Then the reservoir is used for recombining the stored solutions by Repeated K-Means toward the attainment of a final solution which minimizes the distortion index. PB-KMEANS is currently implemented in Java through parallel streams and lambda expressions. The paper first recalls basic concepts of clustering and of K-Means together with the role of the seeding procedure, then it goes on by describing basic design and implementation issues of PB-K-MEANS. After that, simulation experiments carried out both on synthetic and real-world datasets are reported, confirming good execution performance and careful clustering.
Clustering Accuracy Indexes, K-Means Clustering, Greedy K-Means++, Benchmark and Real-World Datasets, Seeding Procedure, Execution Performance., K-Means clustering, Seeding procedure, Greedy K-Means++, Clustering accuracy indexes, Java parallel streams, Benchmark and real-world datasets, Execution performance, Java Parallel Streams
Clustering Accuracy Indexes, K-Means Clustering, Greedy K-Means++, Benchmark and Real-World Datasets, Seeding Procedure, Execution Performance., K-Means clustering, Seeding procedure, Greedy K-Means++, Clustering accuracy indexes, Java parallel streams, Benchmark and real-world datasets, Execution performance, Java Parallel Streams
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
