Sequence Alignment as Hypothesis Testing

descriptionPublicationkeyboard_double_arrow_right Article 01 May 2011 English Publisher:Mary Ann Liebert IncJournal:Journal of Computational Biology, volume 18, pages 677-691 (issn: 1066-5277, eissn: 1557-8666,

Copyright policy )

Authors: Lu, Meng; Fengzhu, Sun; Xuegong, Zhang; Michael S, Waterman;

doi: 10.1089/cmb.2010.0328

pmid: 21554016

pmc: PMC3122928

Sequence Alignment as Hypothesis Testing

- Summary
- Subjects
- Metrics

Abstract

Sequence alignment depends on the scoring function that defines similarity between pairs of letters. For local alignment, the computational algorithm searches for the most similar segments in the sequences according to the scoring function. The choice of this scoring function is important for correctly detecting segments of interest. We formulate sequence alignment as a hypothesis testing problem, and conduct extensive simulation experiments to study the relationship between the scoring function and the distribution of aligned pairs within the aligned segment under this framework. We cut through the many ways to construct scoring functions and showed that any scoring function with negative expectation used in local alignment corresponds to a hypothesis test between the background distribution of sequence letters and a statistical distribution of letter pairs determined by the scoring function. The results indicate that the log-likelihood ratio scoring function is statistically most powerful and has the highest accuracy for detecting the segments of interest that are defined by the statistical distribution of aligned letter pairs.

Related Organizations

Tsinghua University
China (People's Republic of)
University of California System
United States

Keywords

Likelihood Functions, Models, Theoretical, Sequence Alignment, Algorithms, Mathematics

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

7

Average

bronze

Fields of Science (4) View all

engineering and technology

medical engineering

Fields of Science

engineering and technology

medical engineering

View all