EPLSC: A New Semi-Supervised Ensemble Spectral Clustering Algorithm Based on The Graph P-Laplacian for Genetic Data

Name: EPLSC: A New Semi-Supervised Ensemble Spectral Clustering Algorithm Based on The Graph P-Laplacian for Genetic Data
Keywords: QA76.75-76.765, high-dimensional data, Mining engineering. Metallurgy, TN1-997, ensemble learning, random subspace, Computer software, semi-supervised, pairwise constraints, clustering

descriptionPublicationkeyboard_double_arrow_right Article 01 Mar 2025 English Publisher:Bilijipub publisherJournal:Advances in Engineering and Intelligence Systems (issn: 2821-0263,

Authors: Garcia, Valeria; Sanchez, Agustina;

doi: 10.22034/aeis.2025.506411.1287

EPLSC: A New Semi-Supervised Ensemble Spectral Clustering Algorithm Based on The Graph P-Laplacian for Genetic Data

- Summary
- Subjects
- Metrics

Abstract

Due to the ever-increasing amount of information and their detailed analysis, the problem of clustering, which is used to reveal hidden patterns in data, is still of great importance. On the other hand, the clustering of important genetic data, which often have high dimensions, faces many limitations using traditional methods. In the current work, a new semi-supervised ensemble spectral clustering (EPLSC) algorithm based on the graph p-Laplacian for genetic data is introduced. In the proposed approach, we first propagate the pairwise must-linked as well as cannot-linked constraints on all data. Then the feature space is randomly split into various unequal subspaces. Using the updated pairwise constraints, semi-supervised spectral clustering is performed in each subspace independently. Then, using the results of each one, an adjacency matrix is created based on ensemble learning. Next, by using several search operators in environments composed of different subspaces, the best set of subspaces is obtained. Experimental validation on 15 high-dimensional genetic datasets demonstrates that EPLSC outperforms existing methods, achieving improvements of up to 18% in Normalized Mutual Information (NMI) and 15% in Adjusted Rand Index (ARI) compared to traditional semi-supervised techniques. This indicates that EPLSC not only enhances clustering efficacy but also effectively addresses the unique challenges posed by genetic data.

Keywords

QA76.75-76.765, high-dimensional data, Mining engineering. Metallurgy, TN1-997, ensemble learning, random subspace, Computer software, semi-supervised, pairwise constraints, clustering

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

gold