Approximate Matching in Weighted Sequences

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Conference object , Article 01 Jan 2006Publisher:Springer Berlin Heidelberg

Authors: Amihood Amir; Costas S. Iliopoulos; Oren Kapah; Ely Porat;

doi: 10.1007/11780441_33

Approximate Matching in Weighted Sequences

- Summary
- Metrics

Abstract

Weighted sequences have been recently introduced as a tool to handle a set of sequences that are not identical but have many local similarities. The weighted sequence is a “statistical image” of this set, where the probability of every symbol's occurrence at every text location is given. We address the problem of approximately matching a pattern in such a weighted sequence. The pattern is a given string and we seek all locations in the set where the pattern occurs with a high enough probability. We define the notion of Hamming distance and edit distance in weighted sequences and give efficient algorithms for computing them. We compute two versions of the Hamming distance in time $O(n \sqrt{m\log m})$, where n is the length of the weighted text and m is the pattern length. The edit distance is computed in time O(nm) and O(nm2), depending on the edit distance definition used. Unfortunately, due to space considerations, the edit distance details are left to the journal version. We also define the notion of weighted matching in infinite alphabets and show that exact weighted matching can be computed in time O(slog2s), where s is the number of text symbols having non-zero probability. The weighted Hamming distance over infinite alphabets can be computed in time $\min(O(kn\sqrt{s}+s^{3/2}\log^2s), O(s^{4/3}m^{1/3}\log s))$.

Related Organizations

Bar-Ilan University
Israel
Georgia Institute of Technology
United States
King's College London
United Kingdom

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

7

Average

Top 10%

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now