A novel fast multiple nucleotide sequence alignment method based on FM-index

descriptionPublicationkeyboard_double_arrow_right Article 10 Dec 2021 English Publisher:Oxford University Press (OUP)Journal:Briefings in Bioinformatics, volume 23 (issn: 1467-5463, eissn: 1477-4054,

Copyright policy )

Authors: Huan Liu 0024; Quan Zou 0001; Yun Xu;

doi: 10.1093/bib/bbab519

pmid: 34893794

A novel fast multiple nucleotide sequence alignment method based on FM-index

- Summary
- Subjects
- Metrics

Abstract

AbstractMultiple sequence alignment (MSA) is fundamental to many biological applications. But most classical MSA algorithms are difficult to handle large-scale multiple sequences, especially long sequences. Therefore, some recent aligners adopt an efficient divide-and-conquer strategy to divide long sequences into several short sub-sequences. Selecting the common segments (i.e. anchors) for division of sequences is very critical as it directly affects the accuracy and time cost. So, we proposed a novel algorithm, FMAlign, to improve the performance of multiple nucleotide sequence alignment. We use FM-index to extract long common segments at a low cost rather than using a space-consuming hash table. Moreover, after finding the longer optimal common segments, the sequences are divided by the longer common segments. FMAlign has been tested on virus and bacteria genome and human mitochondrial genome datasets, and compared with existing MSA methods such as MAFFT, HAlign and FAME. The experiments show that our method outperforms the existing methods in terms of running time, and has a high accuracy on long sequence sets. All the results demonstrate that our method is applicable to the large-scale nucleotide sequences in terms of sequence length and sequence number. The source code and related data are accessible in https://github.com/iliuh/FMAlign.

Related Organizations

University of Electronic Science and Technology of China
China (People's Republic of)
Harbin University of Science and Technology
China (People's Republic of)

Keywords

Base Sequence, Databases, Factual, Genome, Human, Sequence Analysis, DNA, Research Design, Humans, Sequence Alignment, Algorithms, Genome, Bacterial, Software

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	14
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

14

Top 10%

Average

Top 10%

gold

Fields of Science (4) View all

engineering and technology

medical engineering

Fields of Science

engineering and technology

medical engineering

View all