
Abstract New clustering algorithms are expected to manage complex data, meaning various shapes and densities while being user friendly. This work addresses this challenge. A new clustering algorithm KdMutual 1 driven by the number of clusters is proposed. The idea behind the algorithm is based on the assumption that working with cluster cores rather than considering frontiers makes the clustering process easier. KdMutual is based on three steps: The first one aims at identifying the potential core clusters. It relies on mutual neighborhood and includes specific mechanisms to identify and preserve potential core clusters. The second step is based on a constrained hierarchical process that deals with noise. In the last step the potential clusters are selected using a specific ranking criterion and the final partition is built. KdMutual combines the best characteristics of density peaks and connectivity-based approaches. It is capable of detecting the non-presence of natural clusters. Tests were carried out to compare the proposal with 14 other clustering algorithms. Using 2-dimensional benchmark datasets of various shapes and densities they showed that KdMutual was highly effective in matching a ground truth target. It also proved efficient in high dimensions when clusters are well separated. Moreover, it is able to identify clusters of various densities, partially overlapping and including a large amount of noise within spaces of moderate dimension.
Mutual neighbors, Dissimilarity, Agglomerative, Density, [INFO]Computer Science [cs], [INFO] Computer Science [cs], Clustering, 004
Mutual neighbors, Dissimilarity, Agglomerative, Density, [INFO]Computer Science [cs], [INFO] Computer Science [cs], Clustering, 004
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 15 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
