
pmid: 15587980
An attractive application of expression technologies is to predict drug efficacy or safety using expression data of biomarkers. To evaluate the performance of various classification methods for building predictive models, we applied these methods on six expression datasets. These datasets were from studies using microarray technologies and had either two or more classes. From each of the original datasets, two subsets were generated to simulate two scenarios in biomarker applications. First, a 50-gene subset was used to simulate a candidate gene approach when it might not be practical to measure a large number of genes/biomarkers. Next, a 2000-gene subset was used to simulate a whole genome approach. We evaluated the relative performance of several classification methods by using leave-one-out cross-validation and bootstrap cross-validation. Although all methods perform well in both subsets for a relative easy dataset with two classes, differences in performance do exist among methods for other datasets. Overall, partial least squares discriminant analysis (PLS-DA) and support vector machines (SVM) outperform all other methods. We suggest a practical approach to take advantage of multiple methods in biomarker applications.
Genetic Markers, Principal Component Analysis, Models, Genetic, Discriminant Analysis, Gene Expression, Reproducibility of Results, Statistics, Nonparametric, Artificial Intelligence, Predictive Value of Tests, Data Interpretation, Statistical, Neural Networks, Computer, Least-Squares Analysis, Algorithms, Oligonucleotide Array Sequence Analysis
Genetic Markers, Principal Component Analysis, Models, Genetic, Discriminant Analysis, Gene Expression, Reproducibility of Results, Statistics, Nonparametric, Artificial Intelligence, Predictive Value of Tests, Data Interpretation, Statistical, Neural Networks, Computer, Least-Squares Analysis, Algorithms, Oligonucleotide Array Sequence Analysis
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 40 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
