publication . Article . 2015

Host Subtraction, Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data

Jonathan Luke Heeney; Ricardo Humberto Ramirez Gonzalez; Maxim Wilkinson; Richard M. Leggett;
Open Access English
  • Published: 01 Jan 2015
  • Publisher: PLOS
  • Country: United Kingdom
Abstract
The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of th...
Subjects
Medical Subject Headings: food and beverages
free text keywords: Medicine, R, Science, Q, Research Article, General Biochemistry, Genetics and Molecular Biology, General Agricultural and Biological Sciences, General Medicine
51 references, page 1 of 4

1 Lipkin I. (2013) The changing face of pathogen discovery and surveillance. Nature Reviews Microbiology 11, 133–141. 10.1038/nrmicro2949 23268232 [OpenAIRE] [PubMed] [DOI]

2 Drosten C, Günther S, Preiser W, Van der Werf S, Brodt HR, Becker S et al (2003) Identification of a novel Coronavirus in patients with severe acute respiratory syndrome N Engl J Med. 348, 1967–1976. 12690091 [PubMed]

3 Palacios G, Druce J, Du L, Tran T, Birch C, Briese T et al (2008) A new Arenavirus in a cluster of fatal transplant associated diseases. N Engl J Med. 358, 991–998. 10.1056/NEJMoa073785 18256387 [OpenAIRE] [PubMed] [DOI]

4 Feng H, Shuda M, Chang Y, Moore PS. (2008) Clonal integration of a polyomavirus in Human Merkel Cell Carcinoma. Science. 319, 1096–1100. 10.1126/science.1152586 18202256 [OpenAIRE] [PubMed] [DOI]

5 Hoffmann B, Scheuch M, Höper D, Jungblut R, Holsteg M, Schirrmeier H et al (2012) Novel Orthobunyavirus in cattle, Europe, 2011. Emerg Infect Dis. 18 469–472. 10.3201/eid18 03.111905 22376991 [OpenAIRE] [PubMed] [DOI]

6 Kostic AD, Ojesina A, Pedamallu CS, Jung J, Verhaak RGW, Getz G et al (2011) PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nature Biotechnology 29, 393–396. 10.1038/nbt.1868 21552235 [OpenAIRE] [PubMed] [DOI]

7 Moore RA, Warren RL, Freeman JD, Gustavsen JA, Chénard C, Friedman JM et al (2011) The Sensitivity of Massively Parallel Sequencing for Detecting Candidate Infectious Agents Associated with Human Tissue. PLoS ONE 6, e19838 10.1371/journal.pone.0019838 21603639 [OpenAIRE] [PubMed] [DOI]

8 Roux S, Tournayre J, Mahul A, Debroas D and Enault F. (2014) Metavir 2: new tools for viral metagenome comparison and assembled virome analysis. Bioinformatics 15, 76 10.1186/1471-2105-15-76 24646187 [OpenAIRE] [PubMed] [DOI]

9 Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E et al (2014) A cloud compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res 24, 1180–1192. 10.1101/gr.171934.113 24899342 [OpenAIRE] [PubMed] [DOI]

10 Bhaduri A, Qu K, Lee CS, Ungewickell A, Khavari PA. (2012) Rapid identification of non-human sequences in high throughput sequencing datasets. Bioinformatics 28, 1174–1175. 10.1093/bioinformatics/bts100 22377895 [OpenAIRE] [PubMed] [DOI]

11 Daly GM, Bexfield N, Heaney J, Stubbs S, Mayer AP, Palser A, et al (2011) A Viral Discovery Methodology for Clinical Biopsy Samples Utilising Massively Parallel Next Generation Sequencing. PLoS ONE 6, e28879 10.1371/journal.pone.0028879 22216131 [OpenAIRE] [PubMed] [DOI]

12 Lai B, Ding R, Li Y, Duan L and Zhu H. (2012) A de novo metagenomic as sembly program for shotgun DNA reads. Bioinformatics 28, 1455–1462. 10.1093/bioinformatics/bts162 22495746 [OpenAIRE] [PubMed] [DOI]

13 Mende DR, Waller AS, Sunagawa S, Jarvelin AI, Chan MM, Arumugam M et al (2012) Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data. PLoS ONE 7, e31386 10.1371/journal.pone.0031386 22384016 [OpenAIRE] [PubMed] [DOI]

14 Zhang W, Chen J, Yang Y, Tang Y, Shang J, Shen B. (2011) A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies. PLoS ONE 6, e17915 10.1371/journal.pone.0017915 21423806 [OpenAIRE] [PubMed] [DOI]

15 Haiminen N, Kuhn DN, Parida L and Rigoutsos I. (2011) Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results. PLoS ONE 6, e24182 10.1371/journal.pone.0024182 21915294 [OpenAIRE] [PubMed] [DOI]

51 references, page 1 of 4
Abstract
The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of th...
Subjects
Medical Subject Headings: food and beverages
free text keywords: Medicine, R, Science, Q, Research Article, General Biochemistry, Genetics and Molecular Biology, General Agricultural and Biological Sciences, General Medicine
51 references, page 1 of 4

1 Lipkin I. (2013) The changing face of pathogen discovery and surveillance. Nature Reviews Microbiology 11, 133–141. 10.1038/nrmicro2949 23268232 [OpenAIRE] [PubMed] [DOI]

2 Drosten C, Günther S, Preiser W, Van der Werf S, Brodt HR, Becker S et al (2003) Identification of a novel Coronavirus in patients with severe acute respiratory syndrome N Engl J Med. 348, 1967–1976. 12690091 [PubMed]

3 Palacios G, Druce J, Du L, Tran T, Birch C, Briese T et al (2008) A new Arenavirus in a cluster of fatal transplant associated diseases. N Engl J Med. 358, 991–998. 10.1056/NEJMoa073785 18256387 [OpenAIRE] [PubMed] [DOI]

4 Feng H, Shuda M, Chang Y, Moore PS. (2008) Clonal integration of a polyomavirus in Human Merkel Cell Carcinoma. Science. 319, 1096–1100. 10.1126/science.1152586 18202256 [OpenAIRE] [PubMed] [DOI]

5 Hoffmann B, Scheuch M, Höper D, Jungblut R, Holsteg M, Schirrmeier H et al (2012) Novel Orthobunyavirus in cattle, Europe, 2011. Emerg Infect Dis. 18 469–472. 10.3201/eid18 03.111905 22376991 [OpenAIRE] [PubMed] [DOI]

6 Kostic AD, Ojesina A, Pedamallu CS, Jung J, Verhaak RGW, Getz G et al (2011) PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nature Biotechnology 29, 393–396. 10.1038/nbt.1868 21552235 [OpenAIRE] [PubMed] [DOI]

7 Moore RA, Warren RL, Freeman JD, Gustavsen JA, Chénard C, Friedman JM et al (2011) The Sensitivity of Massively Parallel Sequencing for Detecting Candidate Infectious Agents Associated with Human Tissue. PLoS ONE 6, e19838 10.1371/journal.pone.0019838 21603639 [OpenAIRE] [PubMed] [DOI]

8 Roux S, Tournayre J, Mahul A, Debroas D and Enault F. (2014) Metavir 2: new tools for viral metagenome comparison and assembled virome analysis. Bioinformatics 15, 76 10.1186/1471-2105-15-76 24646187 [OpenAIRE] [PubMed] [DOI]

9 Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E et al (2014) A cloud compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res 24, 1180–1192. 10.1101/gr.171934.113 24899342 [OpenAIRE] [PubMed] [DOI]

10 Bhaduri A, Qu K, Lee CS, Ungewickell A, Khavari PA. (2012) Rapid identification of non-human sequences in high throughput sequencing datasets. Bioinformatics 28, 1174–1175. 10.1093/bioinformatics/bts100 22377895 [OpenAIRE] [PubMed] [DOI]

11 Daly GM, Bexfield N, Heaney J, Stubbs S, Mayer AP, Palser A, et al (2011) A Viral Discovery Methodology for Clinical Biopsy Samples Utilising Massively Parallel Next Generation Sequencing. PLoS ONE 6, e28879 10.1371/journal.pone.0028879 22216131 [OpenAIRE] [PubMed] [DOI]

12 Lai B, Ding R, Li Y, Duan L and Zhu H. (2012) A de novo metagenomic as sembly program for shotgun DNA reads. Bioinformatics 28, 1455–1462. 10.1093/bioinformatics/bts162 22495746 [OpenAIRE] [PubMed] [DOI]

13 Mende DR, Waller AS, Sunagawa S, Jarvelin AI, Chan MM, Arumugam M et al (2012) Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data. PLoS ONE 7, e31386 10.1371/journal.pone.0031386 22384016 [OpenAIRE] [PubMed] [DOI]

14 Zhang W, Chen J, Yang Y, Tang Y, Shang J, Shen B. (2011) A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies. PLoS ONE 6, e17915 10.1371/journal.pone.0017915 21423806 [OpenAIRE] [PubMed] [DOI]

15 Haiminen N, Kuhn DN, Parida L and Rigoutsos I. (2011) Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results. PLoS ONE 6, e24182 10.1371/journal.pone.0024182 21915294 [OpenAIRE] [PubMed] [DOI]

51 references, page 1 of 4
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue