Evolving feature selection

descriptionPublicationkeyboard_double_arrow_right Article 01 Nov 2005Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Intelligent Systems, volume 20, pages 64-76 (issn: 1541-1672, eissn: 1941-1294,

Copyright policy )

Authors: Huan Liu 0001; Edward R. Dougherty; Jennifer G. Dy; Kari Torkkola; Eugene Tuv; Hanchuan Peng; Chris H. Q. Ding; +6 Authors

doi: 10.1109/mis.2005.105

Evolving feature selection

- Summary
- Metrics

Abstract

Data preprocessing is an indispensable step in effective data analysis. It prepares data for data mining and machine learning, which aim to turn data into business intelligence or knowledge. Feature selection is a preprocessing technique commonly used on high-dimensional data. Feature selection studies how to select a subset or list of attributes or variables that are used to construct models describing data. Its purposes include reducing dimensionality, removing irrelevant and redundant features, reducing the amount of data needed for learning, improving algorithms' predictive accuracy, and increasing the constructed models' comprehensibility. This article considers feature-selection overfitting with small-sample classifier design; feature selection for unlabeled data; variable selection using ensemble methods; minimum redundancy-maximum relevance feature selection; and biological relevance in feature selection for microarray data.

Related Organizations

Arizona State University
United States
Hewlett-Packard (United States)
United States
Texas A&M University
United States
Northwestern University
United States
Lawrence Berkeley National Laboratory
United States

View all View all

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	149
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%