Sublinear-Time Algorithms for Computing &amp; Embedding Gap Edit Distance

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object , Other literature type 01 Nov 2020Embargo end date: 01 Jan 2020Publisher:IEEEJournal:2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS)Funded by:NSF | CIF: Small: New Direction..., EC | MPM, NSF | HDR TRIPODS: Institute fo...

Authors: Tomasz Kociumaka; Barna Saha;

doi: 10.1109/focs46700.2020.00112 , 10.48550/arxiv.2007.12762

arXiv: 2007.12762

Sublinear-Time Algorithms for Computing & Embedding Gap Edit Distance

- Summary
- Subjects
- Metrics

Abstract

In this paper, we design new sublinear-time algorithms for solving the gap edit distance problem and for embedding edit distance to Hamming distance. For the gap edit distance problem, we give an $\tilde{O}(\frac{n}{k}+k^2)$-time greedy algorithm that distinguishes between length-$n$ input strings with edit distance at most $k$ and those with edit distance exceeding $(3k+5)k$. This is an improvement and a simplification upon the result of Goldenberg, Krauthgamer, and Saha [FOCS 2019], where the $k$ vs $��(k^2)$ gap edit distance problem is solved in $\tilde{O}(\frac{n}{k}+k^3)$ time. We further generalize our result to solve the $k$ vs $k'$ gap edit distance problem in time $\tilde{O}(\frac{nk}{k'}+k^2+ \frac{k^2}{k'}\sqrt{nk})$, strictly improving upon the previously known bound $\tilde{O}(\frac{nk}{k'}+k^3)$. Finally, we show that if the input strings do not have long highly periodic substrings, then already the $k$ vs $(1+��)k$ gap edit distance problem can be solved in sublinear time. Specifically, if the strings contain no substring of length $\ell$ with period at most $2k$, then the running time we achieve is $\tilde{O}(\frac{n}{��^2 k}+k^2\ell)$. We further give the first sublinear-time probabilistic embedding of edit distance to Hamming distance. For any parameter $p$, our $\tilde{O}(\frac{n}{p})$-time procedure yields an embedding with distortion $O(kp)$, where $k$ is the edit distance of the original strings. Specifically, the Hamming distance of the resultant strings is between $\frac{k-p+1}{p+1}$ and $O(k^2)$ with good probability. This generalizes the linear-time embedding of Chakraborty, Goldenberg, and Kouck�� [STOC 2016], where the resultant Hamming distance is between $\frac k2$ and $O(k^2)$. Our algorithm is based on a random walk over samples, which we believe will find other applications in sublinear-time algorithms.

Related Organizations

University of California, Berkeley
United States
University of California System
United States
University of Massachusetts Amherst
United States
University of California, San Francisco
United States
Bar-Ilan University
Israel

Keywords

FOS: Computer and information sciences, Computer Science - Data Structures and Algorithms, Data Structures and Algorithms (cs.DS)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	6
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

6

Top 10%

Average

Top 10%

Green

Fields of Science

natural sciences

computer and information sciences

Fields of Science

natural sciences

computer and information sciences

Funded by

NSF| CIF: Small: New Directions in Clustering: Interactive Algorithms and Statistical Models, EC| MPM, NSF| HDR TRIPODS: Institute for Integrated Data Science: A Transdisciplinary Approach to Understanding Fundamental Trade-offs and Theoretical Foundations