publication . Article . 2009

Predictive modeling in case-control single-nucleotide polymorphism studies in the presence of population stratification: a case study using Genetic Analysis Workshop 16 Problem 1 dataset.

Arshadi, Niloofar; Chang, Billy; Kustra, Rafal;
Open Access
  • Published: 01 Dec 2009 Journal: BMC Proceedings, volume 3, page S60 (issn: 1753-6561, Copyright policy)
  • Publisher: Springer Nature
Abstract
In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be derived. Here we propose a novel strategy to deal with population stratification in the context of multivariate predictive modeling. We address the problem by clustering the subjects on the axes of genetic variations, and building a predictive model separately in each cluster. This allows us to control ethnicity without explicitly including it in the model, which could mar...
Subjects
free text keywords: General Biochemistry, Genetics and Molecular Biology, General Medicine, Medicine, business.industry, business, Population stratification, Statistics, Genetic variation, Cluster analysis, Genetic association, Univariate, Multivariate statistics, Bioinformatics, Single-nucleotide polymorphism, Confounding, Proceedings
Related Organizations
Funded by
WT
Project
  • Funder: Wellcome Trust (WT)
,
NIH| Genetic Analysis of Common Diseases: An Evaluation
Project
  • Funder: National Institutes of Health (NIH)
  • Project Code: 5R01GM031575-22
  • Funding stream: NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES

Breiman, L. Random forests. Machine Learning. 2001; 45: 5-32 [OpenAIRE] [DOI]

Friedman, H. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001; 29: 1189-1232 [OpenAIRE] [DOI]

Plenge, RM, Seielstad, M, Padyukov, L, Lee, AT, Remmers, EF, Ding, B, Liew, A, Khalili, H, Chandrasekaran, A, Davies, LR, Li, W, Tan, AK, Bonnard, C, Ong, RT, Thalamuthu, A, Pettersson, S, Liu, C, Tian, C, Chen, WV, Carulli, JP, Beckman, EM, Altshuler, D, Alfredsson, L, Criswell, LA, Amos, CI, Seldin, MF, Kastner, DL, Klareskog, L, Gregersen, PK. TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study. New Engl J Med. 2007; 357: 1199-1209 [OpenAIRE] [PubMed] [DOI]

Price, AL, Patterson, NJ, Plenge, RM, Weinblatt, ME, Shadick, NA, Reich, D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006; 38: 904-909 [OpenAIRE] [PubMed] [DOI]

Purcell, S, Neale, B, Todd-Brown, K, Thomas, L, Ferreira, MA, Bender, D, Maller, J, Sklar, P, de Bakker, PI, Daly, MJ, Sham, PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007; 81: 559-575 [OpenAIRE] [PubMed] [DOI]

Rousseeuw, P. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987; 20: 53-65 [OpenAIRE] [DOI]

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007; 447: 661-678 [OpenAIRE] [PubMed] [DOI]

Marinou, I, Till, SH, Moore, DJ, Wilson, AG. Lack of association or interactions between the IL-4, IL-4Rα and IL-13 genes, and rheumatoid arthritis. Arthritis Res Ther. 2008; 10: R80 [OpenAIRE] [PubMed] [DOI]

Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue