FMtree: a fast locating algorithm of FM-indexes for genomic data

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 18 Sep 2017Embargo end date: 01 Jan 2017 English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 34, pages 416-424 (issn: 1367-4803, eissn: 1367-4811,

Copyright policy )

Authors: Haoyu Cheng; Ming Wu 0003; Yun Xu;

doi: 10.1093/bioinformatics/btx596 , 10.48550/arxiv.1704.04615

pmid: 28968761

arXiv: 1704.04615

FMtree: a fast locating algorithm of FM-indexes for genomic data

- Summary
- Subjects
- Metrics

Abstract

Abstract Motivation As a fundamental task in bioinformatics, searching for massive short patterns over a long text has been accelerated by various compressed full-text indexes. These indexes are able to provide similar searching functionalities to classical indexes, e.g. suffix trees and suffix arrays, while requiring less space. For genomic data, a well-known family of compressed full-text indexes, called FM-indexes, presents unmatched performance in practice. One major drawback of FM-indexes is that their locating operations, which report all occurrence positions of patterns in a given text, are not efficient, especially for the patterns with many occurrences. Results In this paper, we introduce a novel locating algorithm, FMtree, to fast retrieve all occurrence positions of any pattern via FM-indexes. When searching for a pattern over a given text, FMtree organizes the search space of the locating operation into a conceptual multiway tree. As a result, multiple occurrence positions of this pattern can be retrieved simultaneously by traversing the multiway tree. Compared with existing locating algorithms, our tree-based algorithm reduces large numbers of redundant operations and presents better data locality. Experimental results show that FMtree is usually one order of magnitude faster than the state-of-the-art algorithms, and still memory-efficient. Availability and implementation FMtree is freely available at https://github.com/chhylp123/FMtree. Supplementary information Supplementary data are available at Bioinformatics online.

Related Organizations

University of Science and Technology of China
China (People's Republic of)
National University of Defense Technolog
China (People's Republic of)
National University of Defense Technology
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Genome, Genomics, Sequence Analysis, DNA, Mice, Computer Science - Data Structures and Algorithms, Animals, Humans, Data Structures and Algorithms (cs.DS), Algorithms, Software

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	8
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

8

Top 10%

Average

Green

gold

Fields of Science (4) View all

engineering and technology

medical engineering

Fields of Science

engineering and technology

medical engineering

View all