Faster algorithms for string matching with k mismatches

descriptionPublicationkeyboard_double_arrow_right Article 01 Feb 2004 English Publisher:Elsevier BVJournal:Journal of Algorithms, volume 50, pages 257-275 (issn: 0196-6774,

Copyright policy )

Authors: Amir, Amihood; Lewenstein, Moshe; Porat, Ely;

doi: 10.1016/s0196-6774(03)00097-x

Faster algorithms for string matching with k mismatches

- Summary
- Subjects
- Metrics

Abstract

Summary: The string matching with mismatches problem is that of finding the number of mismatches between a pattern \(P\) of length \(m\) and every length \(m\) substring of the text \(T\). Currently, the fastest algorithms for this problem are the following. The Galil-Giancarlo algorithm finds all locations where the pattern has at most \(k\) errors (where \(k \) is part of the input) in time \(O(nk)\). The Abrahamson algorithm finds the number of mismatches at every location in time \(O(\sqrt{m\log m})\). We present an algorithm that is faster than both. Our algorithm finds all locations where the pattern has at most \(k\) errors in time \(O(\sqrt{k\log k})\). We also show an algorithm that solves the above problem in time \(O((n+(nk^3)/m)\log k)\).

Related Organizations

Georgia Institute of Technology
United States
Bar-Ilan University
Israel

Keywords

Combinatorics on words, Approximate string matching, Hamming distance, Combinatorial algorithms on words, Analysis of algorithms, Nonnumerical algorithms, Design and analysis of algorithms

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	93
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%