publication . Article . Preprint . Conference object . 2017

Integrating long-range connectivity information into de Bruijn graphs.

Kiran V. Garimella; Kiran V. Garimella; Isaac Turner; Zamin Iqbal; Zamin Iqbal; Gil McVean; Gil McVean;
Open Access English
  • Published: 08 Jun 2017 Journal: Bioinformatics, volume 34, issue 15, pages 2,556-2,565 (issn: 1367-4803, eissn: 1367-4811, Copyright policy)
  • Publisher: Oxford University Press
  • Country: United Kingdom
Abstract
<jats:title>Abstract</jats:title><jats:sec><jats:title>Motivation</jats:title><jats:p>The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter <jats:italic>k</jats:italic>, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, app...
Subjects
free text keywords: Original Papers, Sequence Analysis, Statistics and Probability, Computational Theory and Mathematics, Biochemistry, Molecular Biology, Computational Mathematics, Computer Science Applications, Data structure, De Bruijn sequence, String graph, De Bruijn graph, symbols.namesake, symbols, Error detection and correction, Theoretical computer science, Graph (abstract data type), Computer science, Sequence assembly, MIT License, Complement graph, Clique-width
Funded by
WT| Understanding the genetic basis of common human diseases: core funding for the Wellcome Trust Centre for Human Genetics.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 090532
  • Funding stream: Cellular and Molecular Neuroscience
,
WT| Genomic medicine and statistics
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 097310
  • Funding stream: Cellular and Molecular Neuroscience
,
WT| The Genetic Analysis of Populations.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 100956
  • Funding stream: Genetics, Genomics and Population Research
,
WT| Statistical methods for analyzing complex genomic variation in human pathogens.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 102541
  • Funding stream: Genetics, Genomics and Population Research
,
WT
Project
  • Funder: Wellcome Trust (WT)
57 references, page 1 of 4

Aguilera A., Gómez-González B. (2008) Genome instability: a mechanistic view of its causes and consequences. Nat. Rev. Genet., 9, 204–217.18227811 [PubMed]

Artzy-Randrup Y.et al (2012) Population structuring of multi-copy, antigen-encoding genes in Plasmodium falciparum. eLife, 1, e00093.23251784 [OpenAIRE] [PubMed]

Bankevich A.et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. J. Comput. Mol. Cell Biol., 19, 455–477.

Bateman A.et al (2016) Limitations of current approaches for reference-free, graph-based variant detection. In: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, New York, NY, USA. ACM, pp. 499–500.

Benoit G.et al (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics, 16, 288.26370285 [OpenAIRE] [PubMed]

Bolger A.M.et al (2017). LOGAN: a framework for LOssless Graph-based ANalysis of high throughput sequence data. bioRxiv, p. 175976.

Bonizzoni P.et al (2016) An external-memory algorithm for string graph construction. Algorithmica, 78, 394–424. [OpenAIRE]

Bowe A.et al (2012) Succinct de Bruijn Graphs In: Raphael B., Tang J. (eds.) Algorithms in Bioinformatics. Springer, Berlin, pp. 225–235.

Bradley P.et al (2015) Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat. Commun., 6, 10063–10063.26686880 [OpenAIRE] [PubMed]

Bradnam K.R.et al (2013) Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience, 2, 10–10.23870653 [OpenAIRE] [PubMed]

Chikhi R., Lavenier D. (2011). Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph In: Przytycka T., Sagot M.-F. (eds.) Algorithms in Bioinformatics. Springer, Berlin, pp. 39–48.

Chikhi R., Rizk G. (2013) Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms Mol. Biol., 8, 22.24040893 [OpenAIRE] [PubMed]

Chikhi R.et al (2015) On the representation of de Bruijn graphs. J. Comput. Biol., 22, 336–352.25629448 [PubMed]

Conway T.C., Bromage A.J. (2011) Succinct data structures for assembling large genomes. Bioinformatics, 27, 479–486.21245053 [PubMed]

de Bruijn N.G. (1946) A Combinatorial Problem. Koninklijke Nederlandsche Akademie Van Wetenschappen, 49, 758–764.

57 references, page 1 of 4
Abstract
<jats:title>Abstract</jats:title><jats:sec><jats:title>Motivation</jats:title><jats:p>The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter <jats:italic>k</jats:italic>, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, app...
Subjects
free text keywords: Original Papers, Sequence Analysis, Statistics and Probability, Computational Theory and Mathematics, Biochemistry, Molecular Biology, Computational Mathematics, Computer Science Applications, Data structure, De Bruijn sequence, String graph, De Bruijn graph, symbols.namesake, symbols, Error detection and correction, Theoretical computer science, Graph (abstract data type), Computer science, Sequence assembly, MIT License, Complement graph, Clique-width
Funded by
WT| Understanding the genetic basis of common human diseases: core funding for the Wellcome Trust Centre for Human Genetics.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 090532
  • Funding stream: Cellular and Molecular Neuroscience
,
WT| Genomic medicine and statistics
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 097310
  • Funding stream: Cellular and Molecular Neuroscience
,
WT| The Genetic Analysis of Populations.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 100956
  • Funding stream: Genetics, Genomics and Population Research
,
WT| Statistical methods for analyzing complex genomic variation in human pathogens.
Project
  • Funder: Wellcome Trust (WT)
  • Project Code: 102541
  • Funding stream: Genetics, Genomics and Population Research
,
WT
Project
  • Funder: Wellcome Trust (WT)
57 references, page 1 of 4

Aguilera A., Gómez-González B. (2008) Genome instability: a mechanistic view of its causes and consequences. Nat. Rev. Genet., 9, 204–217.18227811 [PubMed]

Artzy-Randrup Y.et al (2012) Population structuring of multi-copy, antigen-encoding genes in Plasmodium falciparum. eLife, 1, e00093.23251784 [OpenAIRE] [PubMed]

Bankevich A.et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. J. Comput. Mol. Cell Biol., 19, 455–477.

Bateman A.et al (2016) Limitations of current approaches for reference-free, graph-based variant detection. In: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, New York, NY, USA. ACM, pp. 499–500.

Benoit G.et al (2015) Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph. BMC Bioinformatics, 16, 288.26370285 [OpenAIRE] [PubMed]

Bolger A.M.et al (2017). LOGAN: a framework for LOssless Graph-based ANalysis of high throughput sequence data. bioRxiv, p. 175976.

Bonizzoni P.et al (2016) An external-memory algorithm for string graph construction. Algorithmica, 78, 394–424. [OpenAIRE]

Bowe A.et al (2012) Succinct de Bruijn Graphs In: Raphael B., Tang J. (eds.) Algorithms in Bioinformatics. Springer, Berlin, pp. 225–235.

Bradley P.et al (2015) Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat. Commun., 6, 10063–10063.26686880 [OpenAIRE] [PubMed]

Bradnam K.R.et al (2013) Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience, 2, 10–10.23870653 [OpenAIRE] [PubMed]

Chikhi R., Lavenier D. (2011). Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph In: Przytycka T., Sagot M.-F. (eds.) Algorithms in Bioinformatics. Springer, Berlin, pp. 39–48.

Chikhi R., Rizk G. (2013) Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms Mol. Biol., 8, 22.24040893 [OpenAIRE] [PubMed]

Chikhi R.et al (2015) On the representation of de Bruijn graphs. J. Comput. Biol., 22, 336–352.25629448 [PubMed]

Conway T.C., Bromage A.J. (2011) Succinct data structures for assembling large genomes. Bioinformatics, 27, 479–486.21245053 [PubMed]

de Bruijn N.G. (1946) A Combinatorial Problem. Koninklijke Nederlandsche Akademie Van Wetenschappen, 49, 758–764.

57 references, page 1 of 4
Any information missing or wrong?Report an Issue