Combining computational linguistics with sentence embedding to create a zero-shot NLIDB

descriptionPublicationkeyboard_double_arrow_right Article 01 Dec 2024 English Publisher:Elsevier BVJournal:Array, volume 24, page 100,368 (issn: 2590-0056,

Copyright policy )

Authors: Yuriy Perezhohin; Fernando Peres; Mauro Castelli;

doi: 10.1016/j.array.2024.100368

Combining computational linguistics with sentence embedding to create a zero-shot NLIDB

- Summary
- Subjects
- Metrics

Abstract

Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.

Keywords

TK7885-7895, Computer engineering. Computer hardware, Sentence embeddings, Natural language processing, Electronic computers. Computer science, Computational linguistics, QA75.5-76.95, Text to SQL

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Related to Research communities

Digital Humanities and Cultural Heritage