
Clustering algorithms are well-established and widely used for solving data-mining tasks. Every clustering algorithm is composed of several solutions for specific sub-problems in the clustering process. These solutions are linked together in a clustering algorithm, and they define the process and the structure of the algorithm. Frequently, many of these solutions occur in more than one clustering algorithm. Mostly, new clustering algorithms include frequently occurring solutions to typical sub-problems from clustering, as well as from other machine-learning algorithms. The problem is that these solutions are usually integrated in their algorithms, and that original algorithms are not designed to share solutions to sub-problems outside the original algorithm easily. We propose a way of designing cluster algorithms and to improve existing ones, based on reusable components. Reusable components are well-documented, frequently occurring solutions to specific sub-problems in a specific area. Thus we identify reusable components, first, as solutions to characteristic sub-problems in partitioning cluster algorithms, and, further, identify a generic structure for the design of partitioning cluster algorithms. We analyze some partitioning algorithms (K-means, X-means, MPCK-means, and Kohonen SOM), and identify reusable components in them. We give examples of how new cluster algorithms can be designed based on them.
Reusable component, Cluster algorithm, MPCK-means, Kohonen SOM, K-means, Generic, Partitioning clustering, X-means
Reusable component, Cluster algorithm, MPCK-means, Kohonen SOM, K-means, Generic, Partitioning clustering, X-means
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 14 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
