Towards Expert Financial QA via Self-Improving RAG

Expert financial question answering over SEC filings demands numeric faithfulness and auditable provenance, yet single-pass RAG systems silently hallucinate figures and offer no mechanism to recognize their own failures. Compounding this, financial deployments operate under a "walled garden" constraint: web-search fallbacks used by prior corrective RAG methods are prohibited by data governance. We present Self-Improving RAG, a training-free framework that decomposes document QA into three specialized agents (Retrieval, Reasoning, Judge) coordinated by an orchestrator with feedback-driven retry. When the Judge scores an answer below a dynamic threshold, the system escalates in place, broader retrieval, more careful prompting, and relaxed acceptance, never leaving the authorized corpus. On FinanceBench, our approach reaches 86% accuracy under oracle-guided evaluation (up from 53% single-pass) with a 36.4% Lazarus Rate, recovering nearly four in ten initially wrong answers. We additionally report an honest caveat: under fully blind deployment the same judge accepts only 31%, exposing judge quality, not the retry loop, as the true bottleneck for regulated finance QA.

Found an issue? Give us feedback