MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 27 Feb 2023Publisher:Frontiers Media SAJournal:Frontiers in Genetics, volume 14 (eissn: 1664-8021,

Copyright policy )

Authors: Zeyuan Wang; Hong Gu; Minghui Zhao; Dan Li; Jia Wang;

doi: 10.3389/fgene.2023.1135260

pmid: 36923794

pmc: PMC10008853

MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data

- Summary
- Subjects
- Metrics

Abstract

Many clustering techniques have been proposed to group genes based on gene expression data. Among these methods, semi-supervised clustering techniques aim to improve clustering performance by incorporating supervisory information in the form of pairwise constraints. However, noisy constraints inevitably exist in the constraint set obtained on the practical unlabeled dataset, which degenerates the performance of semi-supervised clustering. Moreover, multiple information sources are not integrated into multi-source constraints to improve clustering quality. To this end, the research proposes a new multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints (MSC-CSMC) for unlabeled gene expression data. The proposed method first uses the gene expression data and the gene ontology (GO) that describes gene annotation information to form multi-source constraints. Then, the multi-source constraints are applied to the clustering by improving the constraint violation penalty weight in the semi-supervised clustering objective function. Furthermore, the constraints selection and cluster prototypes are put into the multi-objective evolutionary framework by adopting a mixed chromosome encoding strategy, which can select pairwise constraints suitable for clustering tasks through synergistic optimization to reduce the negative influence of noisy constraints. The proposed MSC-CSMC algorithm is testified using five benchmark gene expression datasets, and the results show that the proposed algorithm achieves superior performance.

Related Organizations

Dalian University of Technology
China (People's Republic of)
JILIN UNIVERSITY
China (People's Republic of)
Xi'an University of Finance and Economics
China (People's Republic of)
Second Affiliated Hospital of Dalian Medical University
China (People's Republic of)
Jilin University
China (People's Republic of)

View all View all

Keywords

constraint selection, multi-source constraints, gene expression data, multi-objective optimization, semi-supervised clustering, Genetics, QH426-470

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green

gold

Fields of Science (3) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all

Related to Research communities

UArctic