Faster algorithms for longest common substring

Name: Faster algorithms for longest common substring
Keywords: Theory of computation → Pattern matching

Charalampopoulos, Panagiotis; Kociumaka, Tomasz; Pissis, Solon P.; Radoszewski, Jakub; Mutzel, Petra; Pagh, Rasmus; Herman, Grzegorz

Found an issue? Give us feedback

Leibniz Internationa...arrow_drop_down

Leibniz International Proceedings in Informatics

Conference object . 2021

Data sources: OpenAIRE

Faster algorithms for longest common substring

descriptionPublicationkeyboard_double_arrow_right Conference object 01 Jan 2021 English Publisher:Schloss Dagstuhl - Leibniz-Zentrum für InformatikJournal:29th Annual European Symposium on Algorithms (ESA 2021), volume 204, pages 1-17 (issn: 1868-8969,

Copyright policy )Funded by:EC | PANGAIA, EC | ALPACA

Authors: Charalampopoulos, Panagiotis; Kociumaka, Tomasz; Pissis, Solon P.; Radoszewski, Jakub; Mutzel, Petra; Pagh, Rasmus; Herman, Grzegorz;

Faster algorithms for longest common substring

- Summary
- Subjects
- Metrics

Abstract

In the classic longest common substring (LCS) problem, we are given two strings S and T, each of length at most n, over an alphabet of size σ, and we are asked to find a longest string occurring as a fragment of both S and T. Weiner, in his seminal paper that introduced the suffix tree, presented an 𝒪(n log σ)-time algorithm for this problem [SWAT 1973]. For polynomially-bounded integer alphabets, the linear-time construction of suffix trees by Farach yielded an 𝒪(n)-time algorithm for the LCS problem [FOCS 1997]. However, for small alphabets, this is not necessarily optimal for the LCS problem in the word RAM model of computation, in which the strings can be stored in 𝒪(n log σ/log n) space and read in 𝒪(n log σ/log n) time. We show that, in this model, we can compute an LCS in time 𝒪(n log σ / √{log n}), which is sublinear in n if σ = 2^{o(√{log n})} (in particular, if σ = 𝒪(1)), using optimal space 𝒪(n log σ/log n). We then lift our ideas to the problem of computing a k-mismatch LCS, which has received considerable attention in recent years. In this problem, the aim is to compute a longest substring of S that occurs in T with at most k mismatches. Flouri et al. showed how to compute a 1-mismatch LCS in 𝒪(n log n) time [IPL 2015]. Thankachan et al. extended this result to computing a k-mismatch LCS in 𝒪(n log^k n) time for k = 𝒪(1) [J. Comput. Biol. 2016]. We show an 𝒪(n log^{k-1/2} n)-time algorithm, for any constant integer k > 0 and irrespective of the alphabet size, using 𝒪(n) space as the previous approaches. We thus notably break through the well-known n log^k n barrier, which stems from a recursive heavy-path decomposition technique that was first introduced in the seminal paper of Cole et al. [STOC 2004] for string indexing with k errors.

LIPIcs, Vol. 204, 29th Annual European Symposium on Algorithms (ESA 2021), pages 30:1-30:17

Related Organizations

View all View all

Keywords

Theory of computation → Pattern matching

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold

Funded by

EC| PANGAIA, EC| ALPACA

Related to Research communities

INRIA