Statistical Analysis of Microarray Data Clustering using NMF, Spectral Clustering, Kmeans, and GMM

Name: Statistical Analysis of Microarray Data Clustering using NMF, Spectral Clustering, Kmeans, and GMM
Creator: Andri Mirzal
Keywords: Normal Distribution, Cluster Analysis, 0101 mathematics, 01 natural sciences, Algorithms

Andri Mirzal

Found an issue? Give us feedback

IEEE/ACM Transaction...arrow_drop_down

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Article . 2022 . Peer-reviewed

License: IEEE Copyright

Data sources: Crossref

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Article . 2022

Data sources: Europe PubMed Central

DBLP

Article

Data sources: DBLP

https://dx.doi.org/10.1109/tcb...

Article

Data sources: Microsoft Academic Graph

Statistical Analysis of Microarray Data Clustering using NMF, Spectral Clustering, Kmeans, and GMM

descriptionPublicationkeyboard_double_arrow_right Article 01 Mar 2022Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE/ACM Transactions on Computational Biology and Bioinformatics, volume 19, pages 1,173-1,192 (issn: 1545-5963, eissn: 2374-0043,

Copyright policy )

Authors: Andri Mirzal;

doi: 10.1109/tcbb.2020.3025486

pmid: 32956065

Statistical Analysis of Microarray Data Clustering using NMF, Spectral Clustering, Kmeans, and GMM

- Summary
- Subjects
- Metrics

Abstract

In unsupervised learning literature, the study of clustering using microarray gene expression datasets has been extensively conducted with nonnegative matrix factorization (NMF), spectral clustering, kmeans, and gaussian mixture model (GMM)are some of the most used methods. However, there is still a limited number of works that utilize statistical analysis to measure the significances of performance differences between these methods. In this paper, statistical analysis of performance differences between ten NMF, six spectral clustering, four GMM, and the standard kmeans algorithms in clustering eleven publicly available microarray gene expression datasets with the number of clusters ranges from two to ten is presented. The experimental results show that statistically NMFs and kmeans have similar performances and outperform spectral clustering. As spectral clustering can be used to uncover hidden manifold structures, the underperformance of spectral methods leads us to question whether the datasets have manifold structures. Visual inspection using multidimensional scaling plots indicates that such structures do not exist. Moreover, as the plots indicate that clusters in some datasets have elliptical boundaries, GMM methods are also utilized. The experimental results show that GMM methods outperform the other methods to some degree, and thus imply that the datasets follow gaussian distributions.

Related Organizations

King Fahd University of Petroleum and Minerals
Saudi Arabia

Keywords

Normal Distribution, Cluster Analysis, Algorithms

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	38
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%

Found an issue? Give us feedback

38

Top 10%

Top 1%

Fields of Science

Fields of Science

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now