Search for Risk Haplotype Segments with GWAS Data by Use of Finite Mixture Models

Article English OPEN
Ali, Fadhaa ; Zhang, Jian (2015)

The region-based association analysis has been proposed to capture the\ud collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease. Such an analysis typically\ud involves a list of unphased multiple-locus genotypes with\ud potentially sparse frequencies in cases and controls.\ud To tackle the problem of the sparse distribution, a two-stage approach was proposed in literature: In the first stage, haplotypes are computationally inferred from genotypes, followed by a haplotype co-classification. In the second stage, the association analysis is performed on the inferred haplotype groups. If a haplotype is unevenly distributed between the case and control samples, this \ud haplotype is labeled as a risk haplotype. Unfortunately, the in-silico reconstruction of haplotypes might produce a proportion of \ud false haplotypes which hamper the detection of rare but true \ud haplotypes. Here, to address the issue, we propose an alternative approach: In Stage 1, we cluster genotypes instead of inferred haplotypes and estimate the\ud risk genotypes based on a finite mixture model. In Stage 2, we infer risk haplotypes from risk genotypes inferred from the \ud previous stage. \ud To estimate the finite mixture model, we propose an EM algorithm with a novel data partition-based initialization.\ud The performance of the proposed procedure is assessed by \ud simulation studies and a real data analysis. Compared to the existing \ud multiple Z-test procedure, we find that the power of genome-wide association studies can be increased by using the proposed procedure.
  • References (13)
    13 references, page 1 of 2

    Agresti, A. (1999). On logit confidence intervals for the odds ratio with small samples. Biometrics, 55, 597-602.

    Browning, S. R. and Browning, B.L. (2007). Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering.

    American Journal of Human Genetics, 81,1084-1097 Hindorff, L.A., Sethupathy, P., Junkins,H.A., Ramos, E.M., Mehta, J.P., Collins, F.S., and Manolio, T. A. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA, 106, 93629367.

    Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance.

    Biometrika, 75, 800-802.

    Hudson, R. R. (2002) Generating samples under a Wright-Fisher neutral model. Bioinformatics, 18, 337-8.

    Karlis, D. and Xekalaki, E. (2003). Choosing initial values for the EM algorithm for finite mixtures. Comput. Stat. & Data Ana. , 41, 577-590.

    Li, M., Ye, C., Fu, W., Elston, R.C., and Lu, Q. (2011) Detecting Genetic Interactions for Quantitative Traits with U-Statistics. Genet. Epidemiol., 35, 457-468.

    Li Y., Byrnes, A.E., and Li, M. (2010) To identify associations with rare variants, Just WHalt: Weighted haplotype and imputation-based tests. Ameri. Jour. Hum. Genet., 87, 728-735.

    McLachlan, G.J. and Basford, K.E. (1988). Mixture models: Inference and applications to clustering. Marcel Dekker, New York.

  • Metrics
    views in OpenAIRE
    views in local repository
    downloads in local repository

    The information is available from the following content providers:

    From Number Of Views Number Of Downloads
    Kent Academic Repository - IRUS-UK 0 51
Share - Bookmark