A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 08 Sep 2011Embargo end date: 01 Jan 2012 English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 27, pages 2,987-2,993 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )Funded by:NIH | Joint SNP and CNV calling...

Authors: Heng Li 0002;

doi: 10.1093/bioinformatics/btr509 , 10.48550/arxiv.1203.6372

pmid: 21903627

pmc: PMC3198575

arXiv: 1203.6372

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. Results: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. Availability: http://samtools.sourceforge.net Contact: hengli@broadinstitute.org

Related Organizations

BROAD INSTITUTE, INC.
Broad Institute
United States

Keywords

Genomics (q-bio.GN), Genotype, Sequence Analysis, DNA, Polymorphism, Single Nucleotide, Genetics, Population, Gene Frequency, Data Interpretation, Statistical, FOS: Biological sciences, Mutation, Humans, Quantitative Biology - Genomics, Alleles, Genetic Association Studies

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	6K
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 0.01%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 0.01%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 0.1%