Approximate String Processing

Marios Hadjieleftheriou; Divesh Srivastava

Found an issue? Give us feedback

Foundations and Tren...arrow_drop_down

Foundations and Trends in Databases

Article . 2011 . Peer-reviewed

Data sources: Crossref

DBLP

Article . 2020

Data sources: DBLP

Approximate String Processing

descriptionPublicationkeyboard_double_arrow_right Article 22 Feb 2011 English Publisher:EmeraldJournal:Foundations and Trends in Databases, volume 2, pages 267-402 (issn: 1931-7883, eissn: 1931-7891,

Copyright policy )

Authors: Marios Hadjieleftheriou; Divesh Srivastava;

doi: 10.1561/1900000010

Approximate String Processing

- Summary
- Metrics

Abstract

One of the most important primitive data types in modern data processing is text. Text data are known to have a variety of inconsistencies (e.g., spelling mistakes and representational variations). For that reason, there exists a large body of literature related to approximate processing of text. This monograph focuses specifically on the problem of approximate string matching, where, given a set of strings S and a query string υ, the goal is to find all strings s ∈ S that have a user specified degree of similarity to υ. Set S could be, for example, a corpus of documents, a set of web pages, or an attribute of a relational table. The similarity between strings is always defined with respect to a similarity function that is chosen based on the characteristics of the data and application at hand. This work presents a survey of indexing techniques and algorithms specifically designed for approximate string matching. We concentrate on inverted indexes, filtering techniques, and tree data structures that can be used to evaluate a variety of set based and edit based similarity functions. We focus on all-match and top-k flavors of selection and join queries, and discuss the applicability, advantages and disadvantages of each technique for every query type.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	9
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

9

Average

Top 10%

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now