Right Answer, Wrong Question: Semantic Hallucination and a Definition-First Architecture for Personalized AI

Lee, Taekyung

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Preprint

Data sources: ZENODO

Right Answer, Wrong Question: Semantic Hallucination and a Definition-First Architecture for Personalized AI

descriptionPublicationkeyboard_double_arrow_right Preprint Under curation English Publisher:Zenodo

Authors: Lee, Taekyung;

doi: 10.5281/zenodo.20652517

Right Answer, Wrong Question: Semantic Hallucination and a Definition-First Architecture for Personalized AI

- Summary

Abstract

This paper identifies a failure mode in personalized AI that existing hallucination defenses do not address. Large language model hallucination is commonly framed as a failure of factual grounding: the answer is false, fabricated, or unsupported by evidence. The paper argues that personalized AI also fails in a different way — by misidentifying what the user means by key terms. A response can be factually correct and logically coherent while still answering the wrong operative definition. The paper names this failure semantic hallucination: confident generation under an incorrect operative definition, without disclosure or clarification. The term is deliberately narrow. Ambiguity before the system commits to a meaning is not counted, and a clarification question is not counted; the failure occurs only when the model silently selects the wrong meaning and answers as if the user's intended meaning had been captured. The central contribution is DF-SSMA, a Definition-First Socio-Semantic Multi-Agent Architecture. The architecture separates semantic grounding from factual grounding. Before any retrieval or debate, it detects high-risk ambiguous terms, retrieves user-specific definition priors from a Personal Semantic Memory, compares them with public and technical definitions, and either locks the operative definition or asks a definitional clarification question. A Social Semantic Calibration layer then labels and translates private meanings into socially and technically intelligible language, limiting private-language drift without erasing the user's meaning. Only after this definition-first layer does the system invoke factual retrieval and role-differentiated reasoning through Advocate, Challenger, and Mediator agents — the multi-agent structure developed in the author's companion work on sycophantic spiraling, now placed after definition locking so that the agents argue about the right premise. A Monte Carlo simulation compares nine conditions, from a baseline model through RAG-only, memory-only, definition-only, and multi-agent-only ablations to the full architecture, across ambiguous-term tasks and eight stress scenarios. The central result is modular specificity: neither of the two dominant existing defenses reduces semantic hallucination. Retrieval-augmented generation and multi-agent debate both leave the semantic hallucination rate at the baseline level, while definition-first locking drives it to zero in the synthetic setting. That zero is not free, and the cost is reported rather than hidden: roughly three-fifths of conversations are converted into clarification turns, and the Personal Semantic Memory prior is what keeps that burden bounded — without it, the definition-only system reaches zero only by clarifying on essentially every conversation. In the combined worst-case scenario, normalized composite hallucination risk falls from 0.5778 for the baseline to 0.0789 for the full architecture, an 86.3% reduction. Several controls test whether the result is an artifact. A negative control that keeps the locking mechanism but selects definitions at random stays at chance level, showing the reduction comes from inference accuracy, not from the lock itself. A misspecified-memory stress test replaces the stored definition with a confidently wrong one: a memory-only system degrades toward chance exactly as a skeptic would predict, while the full architecture detects the conflict between wrong memory and independent retrieval and routes it to clarification, keeping semantic hallucination at essentially zero even when every stored definition is wrong. The composite ranking holds across 99.7% of the metric weight space and across all 512 Latin Hypercube samples of the joint parameter space. A small twelve-item pilot on real generated text shows the construct is measurable outside simulation: a baseline model silently committed to a wrong definition on eight of twelve items, while a definition-first strategy committed on none and clarified exactly the genuinely ambiguous items. The paper does not claim that deployed systems would achieve these numerical reductions, that all hallucination is semantic, or that the simulation substitutes for human-facing validation. It tests structural plausibility under declared synthetic assumptions, states eight falsification conditions with pre-specified quantitative thresholds, and includes a pre-registration-ready LLM-in-the-loop validation protocol. The design rule it advances is a matter of sequence: define before retrieving, clarify before guessing, calibrate before personalizing, challenge before agreeing, and mediate before concluding. Before answering, define. Keywords: semantic hallucination, definition-first grounding, Personal Semantic Memory, personalized AI, operative definition, ambiguity, clarification, retrieval-augmented generation, multi-agent systems, Advocate Challenger Mediator, sycophancy, private-language drift, social semantic calibration, AI safety, Monte Carlo simulation, falsifiability.

Found an issue? Give us feedback