Pairwise alignment of nucleotide sequences using maximal exact matches

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 21 May 2019 English Publisher:Springer Science and Business Media LLCJournal:BMC Bioinformatics, volume 20 (eissn: 1471-2105,

Copyright policy )

Authors: Bayat, A; Gaëta, B; Ignjatovic, A; Parameswaran, S;

doi: 10.1186/s12859-019-2827-0

pmid: 31113356

pmc: PMC6528274

handle: 1959.4/unsworks_64714

Pairwise alignment of nucleotide sequences using maximal exact matches

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

Pairwise alignment of short DNA sequences with affine-gap scoring is a common processing step performed in a range of bioinformatics analyses. Dynamic programming (i.e. Smith-Waterman algorithm) is widely used for this purpose. Despite using data level parallelisation, pairwise alignment consumes much time. There are faster alignment algorithms but they suffer from the lack of accuracy.In this paper, we present MEM-Align, a fast semi-global alignment algorithm for short DNA sequences that allows for affine-gap scoring and exploit sequence similarity. In contrast to traditional alignment method (such as Smith-Waterman) where individual symbols are aligned, MEM-Align extracts Maximal Exact Matches (MEMs) using a bit-level parallel method and then looks for a subset of MEMs that forms the alignment using a novel dynamic programming method. MEM-Align tries to mimic alignment produced by Smith-Waterman. As a result, for 99.9% of input sequence pair, the computed alignment score is identical to the alignment score computed by Smith-Waterman. Yet MEM-Align is up to 14.5 times faster than the Smith-Waterman algorithm. Fast run-time is achieved by: (a) using a bit-level parallel method to extract MEMs; (b) processing MEMs rather than individual symbols; and, (c) applying heuristics.MEM-Align is a potential candidate to replace other pairwise alignment algorithms used in processes such as DNA read-mapping and Variant-Calling.

Related Organizations

UNSW Sydney
Australia
Commonwealth Scientific and Industrial Research Organisation
Australia
CSIRO
Australia

Keywords

570, QH301-705.5, Computer applications to medicine. Medical informatics, anzsrc-for: 46 Information and Computing Sciences, R858-859.7, anzsrc-for: 49 Mathematical sciences, Dynamic programming, 3102 Bioinformatics and Computational Biology, 46 Information and Computing Sciences, Sequence alignment, anzsrc-for: 31 Biological Sciences, Biology (General), Nucleotides, Methodology Article, anzsrc-for: 01 Mathematical Sciences, DNA, Sequence Analysis, DNA, anzsrc-for: 06 Biological Sciences, Generic health relevance, anzsrc-for: 3102 Bioinformatics and Computational Biology, anzsrc-for: 08 Information and Computing Sciences, Affine-gap penalty, Sequence Analysis, Sequence Alignment, Algorithms, 31 Biological Sciences

2 Research products, page 1 of 1

Pairwise alignment of nucleotide sequences using maximal exact matches
2019IsSupplementedBy
Additional file 1 of Pairwise alignment of nucleotide sequences using maximal exact matches
2019IsSupplementedBy

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	8
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average