FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis

Name: FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis
Keywords: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Artificial Intelligence

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2025Embargo end date: 01 Jan 2025Publisher:arXiv

Authors: Nguyen, Quang Hung; Trinh, Phuong Anh; Mai, Phan Quoc Hung; Trinh, Tuan Phong;

doi: 10.48550/arxiv.2506.23273

arXiv: 2506.23273

FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis

- Summary
- Subjects
- Metrics

Abstract

Despite the advancements of large language models, text2sql still faces many challenges, particularly with complex and domain-specific queries. In finance, database designs and financial reporting layouts vary widely between financial entities and countries, making text2sql even more challenging. We present FinStat2SQL, a lightweight text2sql pipeline enabling natural language queries over financial statements. Tailored to local standards like VAS, it combines large and small language models in a multi-agent setup for entity extraction, SQL generation, and self-correction. We build a domain-specific database and evaluate models on a synthetic QA dataset. A fine-tuned 7B model achieves 61.33\% accuracy with sub-4-second response times on consumer hardware, outperforming GPT-4o-mini. FinStat2SQL offers a scalable, cost-efficient solution for financial analysis, making AI-powered querying accessible to Vietnamese enterprises.

Accepted for The 18th International Natural Language Generation Conference (INLG)

Keywords

FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Artificial Intelligence

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Related to Research communities

Knowmad Institut