Supporting Uncertain Predicates in DBMS Using Approximate String Matching and Probabilistic Databases

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2020Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Access, volume 8, pages 169,070-169,081 (eissn: 2169-3536,

Copyright policy )

Authors: Amol S. Jumde; Ravindra B. Keskar;

doi: 10.1109/access.2020.3021945

Supporting Uncertain Predicates in DBMS Using Approximate String Matching and Probabilistic Databases

- Summary
- Subjects
- Metrics

Abstract

Current relational database systems are deterministic in nature and lack the support for approximate matching. The result of approximate matching would be the tuples annotated with the percentage of similarity but the existing relational database system can not process these similarity scores further. In this paper, we propose a system to support approximate matching in the DBMS field. We introduce a `≈' (uncertain predicate operator) for approximate matching and devise a novel formula to calculate the similarity scores. Instead of returning an empty answer set in case of no match, our system gives ranked results thereby providing a glance at existing tuples closely matching with the queried literals. Two variants of the `≈' operator are also introduced for numeric data: `≈+' for higher-the-better and `≈-' for lower-the-better cases. Efficient approximate string matching methods are proposed for matching string-type data whereas numeric closeness is used for other types of data (date, time, and number). We also provide results of our system taken over several sample queries that illustrate the significance of our system. All experiments are performed using the MySQL database, whereas the IMDb movie database and European Football database are used as sample datasets.

Related Organizations

Visvesvaraya National Institute of Technology
India

Keywords

Approximate string matching, probabilistic databases, Electrical engineering. Electronics. Nuclear engineering, uncertain predicate, TK1-9971

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Average

gold

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering