Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

Towards Expert Financial QA via Self-Improving RAG

Authors: Xiong, Junjie;

Towards Expert Financial QA via Self-Improving RAG

Abstract

Expert financial question answering over SEC filings demands numeric faithfulness and auditable provenance, yet single-pass RAG systems silently hallucinate figures and offer no mechanism to recognize their own failures. Compounding this, financial deployments operate under a "walled garden" constraint: web-search fallbacks used by prior corrective RAG methods are prohibited by data governance. We present Self-Improving RAG, a training-free framework that decomposes document QA into three specialized agents (Retrieval, Reasoning, Judge) coordinated by an orchestrator with feedback-driven retry. When the Judge scores an answer below a dynamic threshold, the system escalates in place, broader retrieval, more careful prompting, and relaxed acceptance, never leaving the authorized corpus. On FinanceBench, our approach reaches 86% accuracy under oracle-guided evaluation (up from 53% single-pass) with a 36.4% Lazarus Rate, recovering nearly four in ten initially wrong answers. We additionally report an honest caveat: under fully blind deployment the same judge accepts only 31%, exposing judge quality, not the retry loop, as the true bottleneck for regulated finance QA.

Powered by OpenAIRE graph
Found an issue? Give us feedback