Tools for cluster analysis of data from genome-wide association studies

Master thesis English OPEN
Horn, Johanne Håøy;
(2016)
  • Subject: GWAS | SNP | HyperBrowser | LD | clustering | cluster analysis | genome-wide association studies | linkage disequilibrium

In the past couple of decades, genome-wide association studies (GWAS) have become a widely used approach for investigating the underlying genetic architecture of complex human diseases. Each particular GWAS will highlight multiple loci across the genome, in which genoty... View more
  • References (24)
    24 references, page 1 of 3

    1 Introduction 1 1.1 Aims for thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Overview of chapters . . . . . . . . . . . . . . . . . . . . . . . 2

    3 Methods 31 3.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Binary data representations . . . . . . . . . . . . . . . . . . . 32 3.2.1 Binary Taxonomic Units . . . . . . . . . . . . . . . . . 32 3.2.2 Definitions of binary features . . . . . . . . . . . . . . 33 3.2.3 Properties of the different binary representations . . 39 3.2.4 Similarity measures . . . . . . . . . . . . . . . . . . . 40 3.2.5 Standardized measures of distance . . . . . . . . . . . 41 3.3 Continuous data representations . . . . . . . . . . . . . . . . 42 3.3.1 Definitions of continuous features . . . . . . . . . . . 42 3.3.2 Correlation coefficients . . . . . . . . . . . . . . . . . . 43 3.3.3 Standardized measures of distance . . . . . . . . . . . 44

    5 Results 59 5.1 A suite of tools for comparison of diseases . . . . . . . . . . . 59 5.1.1 Main purpose . . . . . . . . . . . . . . . . . . . . . . . 60 5.1.2 Tools for clustering of binary representations . . . . . 60 5.1.3 Tools for clustering of continuous vectors . . . . . . . 61 5.1.4 Tools for empirical exploration . . . . . . . . . . . . . 61 5.1.5 Tools for data modification and creation . . . . . . . . 62 5.2 Use case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.2.1 GSuite creation . . . . . . . . . . . . . . . . . . . . . . 63 5.2.2 Exploration of data properties . . . . . . . . . . . . . 63 5.2.3 Comparison of significant SNPs . . . . . . . . . . . . 64 5.2.4 Comparison of GWAS summary statistics . . . . . . . 64

    Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. In: Nature 533.7604, pp. 452-454. DOI: 10.1038/533452a.

    Beerenwinkel, N., R. F. Schwarz, M. Gerstung, and F. Markowetz (2015). Cancer Evolution: Mathematical Models and Computational Inference. In: Systematic Biology 64.1, e1-e25. DOI: 10.1093/sysbio/syu081.

    Buchanan, C. C., E. S. Torstenson, W. S. Bush, and M. D. Ritchie (2012). A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data. In: Journal of the American Medical Informatics Association : JAMIA 19.2, pp. 289-294. DOI: 10.1136/ amiajnl-2011-000652.

    Bulik-Sullivan, B. et al. (2015). An atlas of genetic correlations across human diseases and traits. In: Nature Genetics 47.11, pp. 1236-1241. DOI: 10 . 1038/ng.3406.

    Choi, S.-S., S.-H. Cha, and C. C. Tappert (2010). A Survey of Binary Similarity and Distance Measures. In: Systemics, cybernetics and informatics 8.1, pp. 43-48.

    Christopher D. Manning (2008). Introduction to information retrieval. In collab. with H. Schütze and P. Raghavan. Cambridge University Press. xxi+482.

    Church, D. M. et al. (2011). Modernizing Reference Genome Assemblies. In: PLoS Biol 9.7. DOI: 10.1371/journal.pbio.1001091.

  • Related Research Results (4)
  • Metrics
Share - Bookmark