Evaluating Methods for Classifying Expression Data

descriptionPublicationkeyboard_double_arrow_right Article 31 Dec 2004 English Publisher:Informa UK LimitedJournal:Journal of Biopharmaceutical Statistics, volume 14, pages 1,065-1,084 (issn: 1054-3406, eissn: 1520-5711,

Copyright policy )

Authors: Michael Z, Man; Greg, Dyson; Kjell, Johnson; Birong, Liao;

doi: 10.1081/bip-200035491

pmid: 15587980

Evaluating Methods for Classifying Expression Data

- Summary
- Subjects
- Metrics

Abstract

An attractive application of expression technologies is to predict drug efficacy or safety using expression data of biomarkers. To evaluate the performance of various classification methods for building predictive models, we applied these methods on six expression datasets. These datasets were from studies using microarray technologies and had either two or more classes. From each of the original datasets, two subsets were generated to simulate two scenarios in biomarker applications. First, a 50-gene subset was used to simulate a candidate gene approach when it might not be practical to measure a large number of genes/biomarkers. Next, a 2000-gene subset was used to simulate a whole genome approach. We evaluated the relative performance of several classification methods by using leave-one-out cross-validation and bootstrap cross-validation. Although all methods perform well in both subsets for a relative easy dataset with two classes, differences in performance do exist among methods for other datasets. Overall, partial least squares discriminant analysis (PLS-DA) and support vector machines (SVM) outperform all other methods. We suggest a practical approach to take advantage of multiple methods in biomarker applications.

Related Organizations

Pfizer (United States)
United States
Allergan (United States)
United States

Keywords

Genetic Markers, Principal Component Analysis, Models, Genetic, Discriminant Analysis, Gene Expression, Reproducibility of Results, Statistics, Nonparametric, Artificial Intelligence, Predictive Value of Tests, Data Interpretation, Statistical, Neural Networks, Computer, Least-Squares Analysis, Algorithms, Oligonucleotide Array Sequence Analysis

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	40
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%