descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Article , Preprint 01 Jan 2021Embargo end date: 01 Jan 2021 English Publisher:Springer International Publishing

Authors: Marcelo Archanjo José; Fabio Gagliardi Cozman;

doi: 10.1007/978-3-030-91699-2_35 , 10.48550/arxiv.2110.03546

arXiv: http://arxiv.org/abs/2110.03546

mRAT-SQL+GAP: A Portuguese Text-to-SQL Transformer

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

The translation of natural language questions to SQL queries has attracted growing attention, in particular in connection with transformers and similar language models. A large number of techniques are geared towards the English language; in this work, we thus investigated translation to SQL when input questions are given in the Portuguese language. To do so, we properly adapted state-of-the-art tools and resources. We changed the RAT-SQL+GAP system by relying on a multilingual BART model (we report tests with other language models), and we produced a translated version of the Spider dataset. Our experiments expose interesting phenomena that arise when non-English languages are targeted; in particular, it is better to train with original and translated training datasets together, even if a single target language is desired. This multilingual BART model fine-tuned with a double-size training dataset (English and Portuguese) achieved 83% of the baseline, making inferences for the Portuguese test dataset. This investigation can help other researchers to produce results in Machine Learning in a language different from English. Our multilingual ready version of RAT-SQL+GAP and the data are available, open-sourced as mRAT-SQL+GAP at: https://github.com/C4AI/gap-text2sql

Published in: Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science

Related Organizations

Universidade de Sao Paolo
Brazil
Universidade de Sao Paulo/Instituto dos Estudos Avançados
Brazil
UNIVERSIDADE DE SAO PAULO
Brazil
Universidade de São Paulo
Brazil

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Artificial Intelligence (cs.AI), H.3.3, Computer Science - Artificial Intelligence, I.2.7, 68T07, 68T50, Computation and Language (cs.CL), I.2.7; H.3.3

1 Research products, page 1 of 1

gap-text2sql software on GitHub
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	9
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%