GBV-SQL: Guided Generation and SQL2Text Back-Translation Validation for Multi-Agent Text2SQL

Name: GBV-SQL: Guided Generation and SQL2Text Back-Translation Validation for Multi-Agent Text2SQL
Keywords: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Artificial Intelligence

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2025Embargo end date: 01 Jan 2025Publisher:arXiv

Authors: Chen, Daojun; Wang, Xi; Ren, Shenyuan; Ma, Qingzhi; Zhao, Pengpeng; Liu, An;

doi: 10.48550/arxiv.2509.12612

arXiv: 2509.12612

GBV-SQL: Guided Generation and SQL2Text Back-Translation Validation for Multi-Agent Text2SQL

- Summary
- Subjects
- Metrics

Abstract

While Large Language Models have significantly advanced Text2SQL generation, a critical semantic gap persists where syntactically valid queries often misinterpret user intent. To mitigate this challenge, we propose GBV-SQL, a novel multi-agent framework that introduces Guided Generation with SQL2Text Back-translation Validation. This mechanism uses a specialized agent to translate the generated SQL back into natural language, which verifies its logical alignment with the original question. Critically, our investigation reveals that current evaluation is undermined by a systemic issue: the poor quality of the benchmarks themselves. We introduce a formal typology for "Gold Errors", which are pervasive flaws in the ground-truth data, and demonstrate how they obscure true model performance. On the challenging BIRD benchmark, GBV-SQL achieves 63.23% execution accuracy, a 5.8% absolute improvement. After removing flawed examples, GBV-SQL achieves 96.5% (dev) and 97.6% (test) execution accuracy on the Spider benchmark. Our work offers both a robust framework for semantic validation and a critical perspective on benchmark integrity, highlighting the need for more rigorous dataset curation.

Keywords

FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Artificial Intelligence

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Related to Research communities

Knowmad Institut