publication . Article . Other literature type . 2017

Alignment-free Sequence Comparison: Benefits, Applications, and Tools

Andrzej Zielezinski; Wojciech Karlowski;
Open Access English
  • Published: 01 Oct 2017 Journal: Genome Biology, volume 18 (issn: 1474-7596, eissn: 1474-760X, Copyright policy)
  • Publisher: BioMed Central
Abstract
Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. The strength of these methods makes them particularly useful for next-generation sequencing data processing and analysis. However, many researchers are unclear about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research. We address these questions and provide a guide to the currently available alignment-free sequence analysis tools. Electronic supplementary material ...
Subjects
free text keywords: Review, Biology (General), QH301-705.5, Genetics, QH426-470
Funded by
FCT| UID/EMS/50022/2013
Project
UID/EMS/50022/2013
Associate Laboratory of Energy, Transports and Aeronautics
  • Funder: Fundação para a Ciência e a Tecnologia, I.P. (FCT)
  • Project Code: 147353
  • Funding stream: 5876
186 references, page 1 of 13

Altschul, SF, Madden, TL, Schäffer, AA, Zhang, J, Zhang, Z, Miller, W. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25: 3389-402 [OpenAIRE] [PubMed] [DOI]

Pearson, WR, Lipman, DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988; 85: 2444-8 [OpenAIRE] [PubMed] [DOI]

Thompson, JD, Higgins, DG, Gibson, TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994; 22: 4673-80 [OpenAIRE] [PubMed] [DOI]

Edgar, RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32: 1792-7 [OpenAIRE] [PubMed] [DOI]

Katoh, K, Misawa, K, Kuma, K, Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002; 30: 3059-66 [OpenAIRE] [PubMed] [DOI]

Finn, RD, Bateman, A, Clements, J, Coggill, P, Eberhardt, RY, Eddy, SR. Pfam: the protein families database. Nucleic Acids Res. 2014; 42: D222-30 [OpenAIRE] [PubMed] [DOI]

Darling, AE, Mau, B, Perna, NT. ProgressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010; 5: e11147 [OpenAIRE] [PubMed] [DOI]

Schwartz, S, Kent, WJ, Smit, A, Zhang, Z, Baertsch, R, Hardison, RC. Human-mouse alignments with BLASTZ. Genome Res. 2003; 13: 103-7 [OpenAIRE] [PubMed] [DOI]

Blanchette, M, Kent, WJ, Riemer, C, Elnitski, L, Smit, AF, Roskin, KM. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004; 14: 708-15 [OpenAIRE] [PubMed] [DOI]

Duffy, S, Shackelton, LA, Holmes, EC. Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet. 2008; 9: 267-76 [OpenAIRE] [PubMed] [DOI]

Song, N, Joseph, JM, Davis, GB, Durand, D. Sequence similarity network reveals common ancestry of multidomain proteins. PLoS Comput Biol. 2008; 4: e1000063 [OpenAIRE] [PubMed] [DOI]

Terrapon, N, Weiner, J, Grath, S, Moore, AD, Bornberg-Bauer, E. Rapid similarity search of proteins using alignments of domain arrangements. Bioinformatics. 2014; 30: 274-81 [OpenAIRE] [PubMed] [DOI]

Xiong, J. Essential bioinformatics. 2006

Rost, B. Twilight zone of protein sequence alignments. Protein Eng. 1999; 12: 85-94 [OpenAIRE] [PubMed] [DOI]

Chattopadhyay, AK, Nasiev, D, Flower, DR. A statistical physics perspective on alignment-independent protein sequence comparison. Bioinformatics. 2015; 31: 2469-74 [OpenAIRE] [PubMed] [DOI]

186 references, page 1 of 13
Abstract
Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. The strength of these methods makes them particularly useful for next-generation sequencing data processing and analysis. However, many researchers are unclear about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research. We address these questions and provide a guide to the currently available alignment-free sequence analysis tools. Electronic supplementary material ...
Subjects
free text keywords: Review, Biology (General), QH301-705.5, Genetics, QH426-470
Funded by
FCT| UID/EMS/50022/2013
Project
UID/EMS/50022/2013
Associate Laboratory of Energy, Transports and Aeronautics
  • Funder: Fundação para a Ciência e a Tecnologia, I.P. (FCT)
  • Project Code: 147353
  • Funding stream: 5876
186 references, page 1 of 13

Altschul, SF, Madden, TL, Schäffer, AA, Zhang, J, Zhang, Z, Miller, W. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25: 3389-402 [OpenAIRE] [PubMed] [DOI]

Pearson, WR, Lipman, DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988; 85: 2444-8 [OpenAIRE] [PubMed] [DOI]

Thompson, JD, Higgins, DG, Gibson, TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994; 22: 4673-80 [OpenAIRE] [PubMed] [DOI]

Edgar, RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32: 1792-7 [OpenAIRE] [PubMed] [DOI]

Katoh, K, Misawa, K, Kuma, K, Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002; 30: 3059-66 [OpenAIRE] [PubMed] [DOI]

Finn, RD, Bateman, A, Clements, J, Coggill, P, Eberhardt, RY, Eddy, SR. Pfam: the protein families database. Nucleic Acids Res. 2014; 42: D222-30 [OpenAIRE] [PubMed] [DOI]

Darling, AE, Mau, B, Perna, NT. ProgressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One. 2010; 5: e11147 [OpenAIRE] [PubMed] [DOI]

Schwartz, S, Kent, WJ, Smit, A, Zhang, Z, Baertsch, R, Hardison, RC. Human-mouse alignments with BLASTZ. Genome Res. 2003; 13: 103-7 [OpenAIRE] [PubMed] [DOI]

Blanchette, M, Kent, WJ, Riemer, C, Elnitski, L, Smit, AF, Roskin, KM. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004; 14: 708-15 [OpenAIRE] [PubMed] [DOI]

Duffy, S, Shackelton, LA, Holmes, EC. Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet. 2008; 9: 267-76 [OpenAIRE] [PubMed] [DOI]

Song, N, Joseph, JM, Davis, GB, Durand, D. Sequence similarity network reveals common ancestry of multidomain proteins. PLoS Comput Biol. 2008; 4: e1000063 [OpenAIRE] [PubMed] [DOI]

Terrapon, N, Weiner, J, Grath, S, Moore, AD, Bornberg-Bauer, E. Rapid similarity search of proteins using alignments of domain arrangements. Bioinformatics. 2014; 30: 274-81 [OpenAIRE] [PubMed] [DOI]

Xiong, J. Essential bioinformatics. 2006

Rost, B. Twilight zone of protein sequence alignments. Protein Eng. 1999; 12: 85-94 [OpenAIRE] [PubMed] [DOI]

Chattopadhyay, AK, Nasiev, D, Flower, DR. A statistical physics perspective on alignment-independent protein sequence comparison. Bioinformatics. 2015; 31: 2469-74 [OpenAIRE] [PubMed] [DOI]

186 references, page 1 of 13
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue