Faster Approximate Elastic-Degenerate String Matching - Part A.

descriptionPublicationkeyboard_double_arrow_right Conference object 01 Jan 2025 Germany, Netherlands Publisher:Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl PublishingFunded by:EC | PANGAIA, EC | ALPACA

Authors: Solon P. Pissis; Jakub Radoszewski; Wiktor Zuba;

handle: 1871.1/767021cf-cc67-43f7-b4b9-f3aab5ca4682

Faster Approximate Elastic-Degenerate String Matching - Part A.

- Summary
- Subjects
- Metrics

Abstract

An elastic-degenerate (ED) string T is a sequence T = T[1] · · · T[n] of n finite sets of strings. The cardinality m of T is the total number of strings in T[i], for all i ∈ [1 . . n]. The size N of T is the total length of all m strings of T. ED strings have been introduced to represent a set of closely-related DNA sequences. Let P = P[1 . . p] be a pattern of length p and k > 0 be an integer. We consider the problem of k-Approximate ED String Matching (EDSM): searching k-approximate occurrences of P in the language of T. We call k-Approximate EDSM under the Hamming distance, k-Mismatch EDSM; and we call k-Approximate EDSM under edit distance, k-Edit EDSM. Bernardini et al. (Theoretical Computer Science, 2020) showed a simple O(kmp + kN)-time algorithm for k-Mismatch EDSM and an O(k2mp + kN)-time algorithm for k-Edit EDSM. We improve the dependency on k in both results, obtaining an Õ(k2/3mp +√kN)-time algorithm for k-Mismatch EDSM and an Õ(kmp + kN)-time algorithm for k-Edit EDSM. Bernardini et al. (Theory of Computing Systems, 2024) presented several algorithms for 1-Approximate EDSM working in Õ(np2 + N) time. They have also left the possibility to generalize these solutions for k > 1 as an open problem. We improve the runtime of their solution for 1-Mismatch and 1-Edit EDSM from Õ(np2 + N) to O(np2 + N). We further show algorithms for k-Approximate EDSM for the Hamming and edit distances working in Õ(np2 + N) time, for any constant k > 0. Finally, we show how our techniques can be applied to improve upon the complexity of the k-Approximate ED String Intersection and k-Approximate Doubly EDSM problems that were introduced very recently by Gabory et al. (Information and Computation, 2025).

Countries

Germany, Netherlands

Related Organizations

University of Warsaw
Poland
Vrije Universiteit Amsterdam
Netherlands
Leibniz Association
Germany
Schloss Dagstuhl – Leibniz Center for Informatics
Germany

Keywords

approximate string matching, Hamming distance, edit distance, ED string, ddc: ddc:004

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Funded by

EC| PANGAIA, EC| ALPACA

Related to Research communities

Aurora Universities Network

Netherlands Research Portal