
Large language models (LLMs) exhibit strong performance across natural language tasks, yet their tendency to hallucinate remains a fundamental barrier to deployment in domain-specific production settings. This paper presents a production case study demonstrating that LoRA fine-tuning on 12,000 domain documents reduces hallucination rates by 22% relative to Llama 3.1 8B while cutting monthly inference costs by 58% compared to GPT-4. Combining LoRA with semantic paragraph-level RAG chunking yields a 31% improvement in retrieval precision measured by RAGAS faithfulness scores across 500+ evaluation queries.
