
handle: 20.500.12587/24338
Clustering provides structural information from unlabeled data. The studies in which the structural information of the dataset is obtained through unsupervised learning approaches such as clustering and then transferred to the supervised learning are noteworthy. In this study, we propose a new preprocessing method, which obtains structural information that is expected to represent the most meaningful summary of the training dataset before applying a supervised learning strategy. To obtain this summary, the CURE clustering method was used. The proposed preprocessing method combined with SVM and a new classification method named representative points based SVM (RP-SVM) was developed. This new method was experimentally tested with various real datasets and was compared with the standard SVM, KMSVM, KNN and CART methods. The RP-SVM has significantly reduced the training size and resulted in less support vectors compared to standard SVM while achieving similar accuracy results. The RP-SVM has achieved better accuracy with less training data compared to KNN and CART. In addition, the RP-SVM has less data reduction compared to the KMSVM, but it is a more stable method that performs well in all datasets used. The results show that the proposed method can extract structural information that provides high quality for classification.
CURE clustering algorithm; Natural structures; Representative points; Structural information; Support vector machines
CURE clustering algorithm; Natural structures; Representative points; Structural information; Support vector machines
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
