
doi: 10.1137/0222018
Summary: The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The \(k\) mismatches problem is to find all approximate occurrences of a pattern string (length \(m\)) in a text string (length \(n\)) with at most \(k\) mismatches. The generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the problem in expected time \(O(kn(1/(m-k)+(k/c)))\), where \(c\) is the size of the alphabet. A related algorithm is developed for the \(k\) differences problem, where the task is to find all approximate occurrences of a pattern in a text with \(\leq k\) differences (insertions, deletions, changes). Experimental evaluation of the algorithms is reported, showing that the new algorithms are often significantly faster than the old ones. Both algorithms are functionally equivalent with the Horspool version of the Boyer-Moore algorithm when \(k=0\).
\(k\) mismatches, string matching, Analysis of algorithms and problem complexity, Database theory, Boyer- Moore algorithm, edit distance, \(k\) differences
\(k\) mismatches, string matching, Analysis of algorithms and problem complexity, Database theory, Boyer- Moore algorithm, edit distance, \(k\) differences
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 51 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
