publication . Article . Preprint . 2017

Integrating long-range connectivity information into de Bruijn graphs.

Isaac Turner; Kiran V Garimella; Zamin Iqbal; Gil McVean;
Open Access
  • Published: 08 Jun 2017 Journal: Bioinformatics, volume 34, pages 2,556-2,565 (issn: 1367-4803, eissn: 1460-2059, Copyright policy)
  • Publisher: Oxford University Press (OUP)
  • Country: United Kingdom
Abstract
<jats:title>Abstract</jats:title><jats:sec><jats:title>Motivation</jats:title><jats:p>The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter <jats:italic>k</jats:italic>, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, app...
Subjects
free text keywords: Statistics and Probability, Computational Theory and Mathematics, Biochemistry, Molecular Biology, Computational Mathematics, Computer Science Applications, Original Papers, Sequence Analysis
Funded by
WT| Genomic medicine and statistics
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 097310
  • Funding stream: Cellular and Molecular Neuroscience
,
WT| The Genetic Analysis of Populations.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 100956
  • Funding stream: Genetics, Genomics and Population Research
,
WT
Project
  • Funder: Wellcome Trust (WT)
,
WT| Understanding the genetic basis of common human diseases: core funding for the Wellcome Trust Centre for Human Genetics.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 090532
  • Funding stream: Cellular and Molecular Neuroscience
,
WT| Statistical methods for analyzing complex genomic variation in human pathogens.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 102541
  • Funding stream: Genetics, Genomics and Population Research
57 references, page 1 of 4

Aguilera A., Gómez-González B. (2008) Genome instability: a mechanistic view of its causes and consequences. Nat. Rev. Genet., 9, 204–217.18227811 [PubMed]

Artzy-Randrup Y.et al (2012) Population structuring of multi-copy, antigen-encoding genes in Plasmodium falciparum. eLife, 1, e00093.23251784 [OpenAIRE] [PubMed]

Bankevich A.et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. J. Comput. Mol. Cell Biol., 19, 455–477. [OpenAIRE]

Bateman A.et al (2016) Limitations of current approaches for reference-free, graph-based variant detection. In: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, New York, NY, USA. ACM, pp. 499–500.

Benoit G.et al (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics, 16, 288.26370285 [OpenAIRE] [PubMed]

Bolger A.M.et al (2017). LOGAN: a framework for LOssless Graph-based ANalysis of high throughput sequence data. bioRxiv, p. 175976.

Bonizzoni P.et al (2016) An external-memory algorithm for string graph construction. Algorithmica, 78, 394–424. [OpenAIRE]

Bowe A.et al (2012) Succinct de Bruijn Graphs In: Raphael B., Tang J. (eds.) Algorithms in Bioinformatics. Springer, Berlin, pp. 225–235.

Bradley P.et al (2015) Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat. Commun., 6, 10063–10063.26686880 [OpenAIRE] [PubMed]

Bradnam K.R.et al (2013) Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience, 2, 10–10.23870653 [OpenAIRE] [PubMed]

Chikhi R., Lavenier D. (2011). Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph In: Przytycka T., Sagot M.-F. (eds.) Algorithms in Bioinformatics. Springer, Berlin, pp. 39–48.

Chikhi R., Rizk G. (2013) Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms Mol. Biol., 8, 22.24040893 [OpenAIRE] [PubMed]

Chikhi R.et al (2015) On the representation of de Bruijn graphs. J. Comput. Biol., 22, 336–352.25629448 [PubMed]

Conway T.C., Bromage A.J. (2011) Succinct data structures for assembling large genomes. Bioinformatics, 27, 479–486.21245053 [PubMed]

de Bruijn N.G. (1946) A Combinatorial Problem. Koninklijke Nederlandsche Akademie Van Wetenschappen, 49, 758–764.

57 references, page 1 of 4
Abstract
<jats:title>Abstract</jats:title><jats:sec><jats:title>Motivation</jats:title><jats:p>The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter <jats:italic>k</jats:italic>, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, app...
Subjects
free text keywords: Statistics and Probability, Computational Theory and Mathematics, Biochemistry, Molecular Biology, Computational Mathematics, Computer Science Applications, Original Papers, Sequence Analysis
Funded by
WT| Genomic medicine and statistics
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 097310
  • Funding stream: Cellular and Molecular Neuroscience
,
WT| The Genetic Analysis of Populations.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 100956
  • Funding stream: Genetics, Genomics and Population Research
,
WT
Project
  • Funder: Wellcome Trust (WT)
,
WT| Understanding the genetic basis of common human diseases: core funding for the Wellcome Trust Centre for Human Genetics.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 090532
  • Funding stream: Cellular and Molecular Neuroscience
,
WT| Statistical methods for analyzing complex genomic variation in human pathogens.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 102541
  • Funding stream: Genetics, Genomics and Population Research
57 references, page 1 of 4

Aguilera A., Gómez-González B. (2008) Genome instability: a mechanistic view of its causes and consequences. Nat. Rev. Genet., 9, 204–217.18227811 [PubMed]

Artzy-Randrup Y.et al (2012) Population structuring of multi-copy, antigen-encoding genes in Plasmodium falciparum. eLife, 1, e00093.23251784 [OpenAIRE] [PubMed]

Bankevich A.et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. J. Comput. Mol. Cell Biol., 19, 455–477. [OpenAIRE]

Bateman A.et al (2016) Limitations of current approaches for reference-free, graph-based variant detection. In: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, New York, NY, USA. ACM, pp. 499–500.

Benoit G.et al (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics, 16, 288.26370285 [OpenAIRE] [PubMed]

Bolger A.M.et al (2017). LOGAN: a framework for LOssless Graph-based ANalysis of high throughput sequence data. bioRxiv, p. 175976.

Bonizzoni P.et al (2016) An external-memory algorithm for string graph construction. Algorithmica, 78, 394–424. [OpenAIRE]

Bowe A.et al (2012) Succinct de Bruijn Graphs In: Raphael B., Tang J. (eds.) Algorithms in Bioinformatics. Springer, Berlin, pp. 225–235.

Bradley P.et al (2015) Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat. Commun., 6, 10063–10063.26686880 [OpenAIRE] [PubMed]

Bradnam K.R.et al (2013) Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience, 2, 10–10.23870653 [OpenAIRE] [PubMed]

Chikhi R., Lavenier D. (2011). Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph In: Przytycka T., Sagot M.-F. (eds.) Algorithms in Bioinformatics. Springer, Berlin, pp. 39–48.

Chikhi R., Rizk G. (2013) Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms Mol. Biol., 8, 22.24040893 [OpenAIRE] [PubMed]

Chikhi R.et al (2015) On the representation of de Bruijn graphs. J. Comput. Biol., 22, 336–352.25629448 [PubMed]

Conway T.C., Bromage A.J. (2011) Succinct data structures for assembling large genomes. Bioinformatics, 27, 479–486.21245053 [PubMed]

de Bruijn N.G. (1946) A Combinatorial Problem. Koninklijke Nederlandsche Akademie Van Wetenschappen, 49, 758–764.

57 references, page 1 of 4
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue