Enriched random forests

descriptionPublicationkeyboard_double_arrow_right Article 22 Jul 2008 English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 24, pages 2,010-2,014 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )

Authors: Dhammika Amaratunga; Javier Cabrera; Yung-Seop Lee;

doi: 10.1093/bioinformatics/btn356

pmid: 18650208

Enriched random forests

- Summary
- Subjects
- Metrics

Abstract

Abstract Although the random forest classification procedure works well in datasets with many features, when the number of features is huge and the percentage of truly informative features is small, such as with DNA microarray data, its performance tends to decline significantly. In such instances, the procedure can be improved by reducing the contribution of trees whose nodes are populated by non-informative features. To some extent, this can be achieved by prefiltering, but we propose a novel, yet simple, adjustment that has demonstrably superior performance: choose the eligible subsets at each node by weighted random sampling instead of simple random sampling, with the weights tilted in favor of the informative features. This results in an ‘enriched random forest’. We illustrate the superior performance of this procedure in several actual microarray datasets. Contact: damaratu@prdus.jnj.com

Related Organizations

Johnson & Johnson (United States)
United States
Rutgers, The State University of New Jersey
United States
Dongguk University
Korea (Republic of)

Keywords

Data Interpretation, Statistical, Gene Expression Profiling, Computer Simulation, Algorithms, Oligonucleotide Array Sequence Analysis

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	176
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

176

Top 1%

Top 10%

gold

Fields of Science (3) View all

medical and health sciences

basic medicine

Fields of Science

medical and health sciences

basic medicine

View all