The Colored Longest Common Prefix Array Computed via Sequential Scans

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Article , Preprint , Conference object 01 Jan 2018Embargo end date: 01 Jan 2018 Italy Publisher:Springer International Publishing

Authors: Garofalo, Fabio; Rosone, Giovanna; Sciortino, Marinella; Verzotto, Davide;

doi: 10.1007/978-3-030-00479-8_13 , 10.48550/arxiv.1807.07596

arXiv: 1807.07596

handle: 10447/337234 , 11568/946091

The Colored Longest Common Prefix Array Computed via Sequential Scans

- Summary
- Subjects
- Related research
  (5)
- Metrics

Abstract

Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-external memory. By using cLCP, we propose an efficient lightweight strategy to solve the multi-string Average Common Substring (ACS) problem, that consists in the pairwise comparison of a single string against a collection of $m$ strings simultaneously, in order to obtain $m$ ACS induced distances. Experimental results confirm the effectiveness of our approach.

Preliminary version of the paper that will be included in the SPIRE 2018 proceedings

Country

Italy

Related Organizations

Keywords

Alignment-free methods, Burrows-wheeler transform, FOS: Computer and information sciences, Average common substring, Matching statistics, Computer Science (all), Theoretical Computer Science, Alignment-free methods; Average common substring; Burrows-wheeler transform; Longest common prefix; Matching statistics; Theoretical Computer Science; Computer Science (all), Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS), Longest common prefix

5 Research products, page 1 of 1

egap software on GitHub
IsRelatedTo
BCR_LCP_GSA software on GitHub
IsRelatedTo
egsa software on GitHub
IsRelatedTo
cLCP-mACS software on GitHub
IsRelatedTo
BEETL software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average