Data Mining in Genomics

descriptionPublicationkeyboard_double_arrow_right Article 01 Mar 2008 English Publisher:Elsevier BVJournal:Clinics in Laboratory Medicine, volume 28, pages 145-166 (issn: 0272-2712,

Copyright policy )

Authors: Jae K, Lee; Paul D, Williams; Sooyoung, Cheon;

doi: 10.1016/j.cll.2007.10.010

pmid: 18194724

pmc: PMC2253491

Data Mining in Genomics

- Summary
- Subjects
- Metrics

Abstract

This article reviews important emerging statistical concepts, data mining techniques, and applications that have been recently developed and used for genomic data analysis. First, general background and some critical issues in genomic data mining are summarized. A novel concept of statistical significance is described, the so-called "false discovery rate"-the rate of false-positives among all positive findings-which has been suggested to control the error rate of numerous false-positives in large screening biological data analysis. Two recent statistical testing methods are then introduced: significance analysis of microarray and local pooled error tests. Statistical modeling in genomic data analysis is then presented, such as analysis of variance and heterogeneous error modeling approaches that have been suggested for analyzing microarray data obtained from multiple experimental or biological conditions. Two sections then describe data exploration and discovery tools largely termed as supervised learning and unsupervised learning. The former approaches include several multivariate statistical methods to investigate coexpression patterns of multiple genes, and the latter are the classification methods to discover genomic biomarker signatures for predicting important subclasses of human diseases. The last section briefly summarizes various genomic data mining approaches in biomedical pathway analysis and patient outcome or chemotherapeutic response prediction.

Related Organizations

University of Virginia
United States
Department of Public Health
United States

Keywords

Models, Genetic, Artificial Intelligence, Data Interpretation, Statistical, Databases, Genetic, Computational Biology, Genomics, Software, Pattern Recognition, Automated

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	23
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%