
handle: 20.500.14243/559747 , 20.500.11770/341397
K-means is a widespread clustering algorithm characterized by its simplicity and efficiency. K-means behavior, though, strongly depends on the initialization of the cluster centers (centroids) and tends to be stuck in a local suboptimal solution. Many techniques have been devised to overcome these problems, e.g., by a global strategy to reduce locality of centroid adjustments or by using density peaks for centroids initialization. This paper proposes an improved version of the K-means—DPKM-based on the concepts of density peaks. Density peaks have been proved to be a key for solving clustering problems where not-spherical regions with complex point distributions are involved. Centroids are actually selected from density peaks by using a technique borrowed from the DK-means++ initialization method, which ensures centroids are not only points with higher density, but also far-away from each other. DPKM is implemented in Java using parallel streams and lambda expressions which are capable of delivering good execution times on large datasets on multicore machines with shared memory. The efficiency and reliability of DPKM are demonstrated by applying it to challenging synthetic datasets often used as benchmarks for clustering methods.
Clustering, K-means, Centroids Initialization, Density Peaks, DK-means++, Java, Parallel Streams, Lambda Expressions, Multi-core Machines, Parallel streams, Centroids initialization, Density peaks, Multicore machines, DK-means++, K-means, Clustering, Java, Lambda expressions
Clustering, K-means, Centroids Initialization, Density Peaks, DK-means++, Java, Parallel Streams, Lambda Expressions, Multi-core Machines, Parallel streams, Centroids initialization, Density peaks, Multicore machines, DK-means++, K-means, Clustering, Java, Lambda expressions
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
