publication . Other literature type . Article . Preprint . 2016

SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments

Andrew Page;
  • Published: 29 Jan 2016
  • Publisher: Microbiology Society
Abstract
<jats:p>Rapidly decreasing genome sequencing costs have led to a proportionate increase in the number of samples used in prokaryotic population studies. Extracting single nucleotide polymorphisms (SNPs) from a large whole genome alignment is now a routine task, but existing tools have failed to scale efficiently with the increased size of studies. These tools are slow, memory inefficient and are installed through non-standard procedures. We present SNP-sites which can rapidly extract SNPs from a multi-FASTA alignment using modest resources and can output results in multiple formats for downstream analysis. SNPs can be extracted from a 8.3 GB alignment file (1,84...
Subjects
free text keywords: Methods Paper, Systems Microbiology: Large-scale comparative genomics, software, SNP calling, high throughput, DNA sequencing, SNP, Genome alignment, Multi-core processor, Single-nucleotide polymorphism, Bioinformatics, Population, education.field_of_study, education, Computer science, Data mining, computer.software_genre, computer, Sequence alignment, Genome, Computational biology, business.industry, business, Sequence analysis
20 references, page 1 of 2

Capella-Gutiérrez,S. et al. (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25, 1972-3.

Chang,C.C. et al. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience, 4, 7.

Chewapreecha,C. et al. (2014) Dense genomic sampling identifies highways of pneumococcal recombination. Nat. Genet., 46, 305-309. [OpenAIRE]

Danecek,P. et al. (2011) The variant call format and VCF tools. Bioinformatics, 27, 2156-8.

Edgar,R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32, 1792-7.

Felsenstein,J. (1989) Phylip: phylogeny inference package (version 3.2). Cladistics, 5, 164-166.

Katoh,K. and Standley,D.M. (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol., 30, 772-80.

Lindenbaum,P. (2015) JVarkit: java-based utilities for Bioinformatics. Figshare. [OpenAIRE]

Lischer,H.E.L. and Excoffier,L. (2012) PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics, 28, 298-9. [OpenAIRE]

Löytynoja,A. (2014) Phylogeny-aware alignment with PRANK. Methods Mol. Biol., 1079, 155-170. [OpenAIRE]

Nasser,W. et al. (2014) Evolutionary pathway to increased virulence and epidemic group A Streptococcus disease derived from 3,615 genome sequences. Proc. Natl. Acad. Sci. U. S. A., 111, E1768-76.

Price,M.N. et al. (2010) FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One, 5, e9490.

Stamatakis,A. (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30, 1312-1313. [OpenAIRE]

Sudmant,P.H. et al. (2015) An integrated map of structural variation in 2,504 human genomes.

Swofford,D.L. (2002) PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods). Sinauer Assoc. Sunderland, Massachusetts., 1-142.

20 references, page 1 of 2
Abstract
<jats:p>Rapidly decreasing genome sequencing costs have led to a proportionate increase in the number of samples used in prokaryotic population studies. Extracting single nucleotide polymorphisms (SNPs) from a large whole genome alignment is now a routine task, but existing tools have failed to scale efficiently with the increased size of studies. These tools are slow, memory inefficient and are installed through non-standard procedures. We present SNP-sites which can rapidly extract SNPs from a multi-FASTA alignment using modest resources and can output results in multiple formats for downstream analysis. SNPs can be extracted from a 8.3 GB alignment file (1,84...
Subjects
free text keywords: Methods Paper, Systems Microbiology: Large-scale comparative genomics, software, SNP calling, high throughput, DNA sequencing, SNP, Genome alignment, Multi-core processor, Single-nucleotide polymorphism, Bioinformatics, Population, education.field_of_study, education, Computer science, Data mining, computer.software_genre, computer, Sequence alignment, Genome, Computational biology, business.industry, business, Sequence analysis
20 references, page 1 of 2

Capella-Gutiérrez,S. et al. (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25, 1972-3.

Chang,C.C. et al. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience, 4, 7.

Chewapreecha,C. et al. (2014) Dense genomic sampling identifies highways of pneumococcal recombination. Nat. Genet., 46, 305-309. [OpenAIRE]

Danecek,P. et al. (2011) The variant call format and VCF tools. Bioinformatics, 27, 2156-8.

Edgar,R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32, 1792-7.

Felsenstein,J. (1989) Phylip: phylogeny inference package (version 3.2). Cladistics, 5, 164-166.

Katoh,K. and Standley,D.M. (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol., 30, 772-80.

Lindenbaum,P. (2015) JVarkit: java-based utilities for Bioinformatics. Figshare. [OpenAIRE]

Lischer,H.E.L. and Excoffier,L. (2012) PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics, 28, 298-9. [OpenAIRE]

Löytynoja,A. (2014) Phylogeny-aware alignment with PRANK. Methods Mol. Biol., 1079, 155-170. [OpenAIRE]

Nasser,W. et al. (2014) Evolutionary pathway to increased virulence and epidemic group A Streptococcus disease derived from 3,615 genome sequences. Proc. Natl. Acad. Sci. U. S. A., 111, E1768-76.

Price,M.N. et al. (2010) FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One, 5, e9490.

Stamatakis,A. (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30, 1312-1313. [OpenAIRE]

Sudmant,P.H. et al. (2015) An integrated map of structural variation in 2,504 human genomes.

Swofford,D.L. (2002) PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods). Sinauer Assoc. Sunderland, Massachusetts., 1-142.

20 references, page 1 of 2
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue