
pmid: 12761060
Abstract Motivation: Clustering analysis of data from DNA microarray hybridization studies is essential for identifying biologically relevant groups of genes. Partitional clustering methods such as K-means or self-organizing maps assign each gene to a single cluster. However, these methods do not provide information about the influence of a given gene for the overall shape of clusters. Here we apply a fuzzy partitioning method, Fuzzy C-means (FCM), to attribute cluster membership values to genes. Results: A major problem in applying the FCM method for clustering microarray data is the choice of the fuzziness parameter m. We show that the commonly used value m = 2 is not appropriate for some data sets, and that optimal values for m vary widely from one data set to another. We propose an empirical method, based on the distribution of distances between genes in a given data set, to determine an adequate value for m. By setting threshold levels for the membership values, genes which are tigthly associated to a given cluster can be selected. Using a yeast cell cycle data set as an example, we show that this selection increases the overall biological significance of the genes within the cluster. Availability: Supplementary text and Matlab functions are available at http://www-igbmc.u-strasbg.fr/fcm/ Contact: doulaye@titus.u-strasbg.fr * To whom correspondence should be addressed.
Quality Control, Gene Expression Profiling, Sequence Analysis, DNA, Fuzzy Logic, Gene Expression Regulation, Neoplasms, Yeasts, Databases, Genetic, [SDV.BBM] Life Sciences [q-bio]/Biochemistry, Molecular Biology, Tumor Cells, Cultured, Cluster Analysis, Humans, Algorithms, Oligonucleotide Array Sequence Analysis
Quality Control, Gene Expression Profiling, Sequence Analysis, DNA, Fuzzy Logic, Gene Expression Regulation, Neoplasms, Yeasts, Databases, Genetic, [SDV.BBM] Life Sciences [q-bio]/Biochemistry, Molecular Biology, Tumor Cells, Cultured, Cluster Analysis, Humans, Algorithms, Oligonucleotide Array Sequence Analysis
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 396 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 0.1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
