
Data clustering consists of grouping similar objects according to some characteristic. In the literature, there are several clustering algorithms, among which stands out the Fuzzy C-Means (FCM), one of the most discussed algorithms, being used in different applications. Although it is a simple and easy to manipulate clustering method, the FCM requires as its initial parameter the number of clusters. Usually, this information is unknown, beforehand and this becomes a relevant problem in the data cluster analysis process. In this context, this work proposes a new methodology to determine the number of clusters of partitional algorithms, using subsets of the original data in order to define the number of clusters. This new methodology, is intended to reduce the side effects of the cluster definition phase, possibly making the processing time faster and decreasing the computational cost. To evaluate the proposed methodology, different cluster validation indices will be used to evaluate the quality of the clusters obtained by the FCM algorithms and some of its variants, when applied to different databases. Through the empirical analysis, we can conclude that the results obtained in this article are promising, both from an experimental point of view and from a statistical point of view.
T57-57.97, Applied mathematics. Quantitative methods, partitional clustering algorithms, clustering fuzzy, number of cluster
T57-57.97, Applied mathematics. Quantitative methods, partitional clustering algorithms, clustering fuzzy, number of cluster
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
