Fuzzy C-means method for clustering microarray data

descriptionPublicationkeyboard_double_arrow_right Article 22 May 2003 France English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 19, pages 973-980 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )

Authors: Dembélé, Doulaye; Kastner, Philippe;

doi: 10.1093/bioinformatics/btg119

pmid: 12761060

Fuzzy C-means method for clustering microarray data

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation: Clustering analysis of data from DNA microarray hybridization studies is essential for identifying biologically relevant groups of genes. Partitional clustering methods such as K-means or self-organizing maps assign each gene to a single cluster. However, these methods do not provide information about the influence of a given gene for the overall shape of clusters. Here we apply a fuzzy partitioning method, Fuzzy C-means (FCM), to attribute cluster membership values to genes. Results: A major problem in applying the FCM method for clustering microarray data is the choice of the fuzziness parameter m. We show that the commonly used value m = 2 is not appropriate for some data sets, and that optimal values for m vary widely from one data set to another. We propose an empirical method, based on the distribution of distances between genes in a given data set, to determine an adequate value for m. By setting threshold levels for the membership values, genes which are tigthly associated to a given cluster can be selected. Using a yeast cell cycle data set as an example, we show that this selection increases the overall biological significance of the genes within the cluster. Availability: Supplementary text and Matlab functions are available at http://www-igbmc.u-strasbg.fr/fcm/ Contact: doulaye@titus.u-strasbg.fr * To whom correspondence should be addressed.

Country

France

Related Organizations

Institut National de la Santé et la Recherche Médicale
France
Institute of Genetics and Molecular and Cellular Biology
France
Centre national de la recherche scientifique
France
UNIVERSITE MARIE ET LOUIS PASTEUR
France
Inserm
France

View all View all

Keywords

Quality Control, Gene Expression Profiling, Sequence Analysis, DNA, Fuzzy Logic, Gene Expression Regulation, Neoplasms, Yeasts, Databases, Genetic, [SDV.BBM] Life Sciences [q-bio]/Biochemistry, Molecular Biology, Tumor Cells, Cultured, Cluster Analysis, Humans, Algorithms, Oligonucleotide Array Sequence Analysis

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	396
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 0.1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%