Multiple alignment by aligning alignments

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jul 2007 English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 23, pages i559-i568 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )

Authors: Travis J. Wheeler; John D. Kececioglu;

doi: 10.1093/bioinformatics/btm226

pmid: 17646343

Multiple alignment by aligning alignments

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation: Multiple sequence alignment is a fundamental task in bioinformatics. Current tools typically form an initial alignment by merging subalignments, and then polish this alignment by repeated splitting and merging of subalignments to obtain an improved final alignment. In general this form-and-polish strategy consists of several stages, and a profusion of methods have been tried at every stage. We carefully investigate: (1) how to utilize a new algorithm for aligning alignments that optimally solves the common subproblem of merging subalignments, and (2) what is the best choice of method for each stage to obtain the highest quality alignment. Results: We study six stages in the form-and-polish strategy for multiple alignment: parameter choice, distance estimation, merge-tree construction, sequence-pair weighting, alignment merging, and polishing. For each stage, we consider novel approaches as well as standard ones. Interestingly, the greatest gains in alignment quality come from (i) estimating distances by a new approach using normalized alignment costs, and (ii) polishing by a new approach using 3-cuts. Experiments with a parameter-value oracle suggest large gains in quality may be possible through an input-dependent choice of alignment parameters, and we present a promising approach for building such an oracle. Combining the best approaches to each stage yields a new tool we call Opal that on benchmark alignments matches the quality of the top tools, without employing alignment consistency or hydrophobic gap penalties. Availability: Opal, a multiple alignment tool that implements the best methods in our study, is freely available at http://opal.cs.arizona.edu Contact: twheeler@cs.arizona.edu

Related Organizations

University of Arizona
United States

Keywords

Sequence Analysis, Protein, Molecular Sequence Data, Proteins, Amino Acid Sequence, Sensitivity and Specificity, Sequence Alignment, Algorithms

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	208
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

208

Top 1%

Top 10%

gold

Fields of Science (4) View all

engineering and technology

medical engineering

Fields of Science

engineering and technology

medical engineering

View all