
The first two papers in this series proposed Autonomous Dissonance Perception (ADP) as a theoretical framework: ADP I introduced the concept of detecting dissonance between external inputs and an LLM's internal world model; ADP II extended this to internal dissonance, arguing that contradictions within a model's parameters could serve as an engine for autonomous cognitive evolution. Both papers lacked empirical validation. This paper addresses that gap with four experiments, two theoretical contributions, and a diagnostic framework for hallucination detection. In Experiment 1, a 7-billion-parameter language model (Qwen2.5-7B-Instruct) produced statistically separable hidden-state patterns when processing consonant, dissonant, and nonsensical inputs, with a classification accuracy of 96.0% for distinguishing dissonant from nonsensical inputs. In Experiment 2, a matched-triad design controlling for topic confound revealed a layer-dependent dissonance signal peaking at the penultimate layer (layer −2, win rate 72.5%) but reversed at the final layer (layer −1, win rate 35%), initially suggesting that alignment training suppresses dissonance signals. Experiment 3 directly tested this interpretation by running an expanded 40-triad design on the base (pre-RLHF) version of the same model. The results overturn the assumption that alignment suppresses cognitive signals: the base model exhibited a nearly identical pattern (layer −2 win rate 72.5%, layer −1 win rate 37.5%), revealing that the layer-dependent structure is a universal architectural property of the Transformer. We term this the Unembedding Bottleneck—a geometric constraint imposed by the final layer's obligation to align hidden states with the vocabulary embedding space for next-token prediction. Experiment 4 directly confirms this hypothesis by measuring vocabulary alignment geometry: the Subspace Projection Ratio drops 95% between Layer −2 and Layer −1 (SPR: 0.73 → 0.04, Cohen's d > 23, steepness ratio > 2.6× the next-largest jump), and this cliff is identical across Instruct and Base models. The Unembedding Bottleneck is no longer a hypothesis—it is a measured geometric fact. The penultimate layer, freed from this constraint, emerges as the last site of unconstrained semantic computation, where epistemic dissonance reaches peak detectability before collapsing into token probabilities. We formalize this as Layer −2 Criticality. We further propose the Closed-Loop Constraint, requiring that any resolution of internal dissonance (ADP II) produce a measurable improvement in external dissonance perception (ADP I). Together, these results unify the ADP framework into a falsifiable system grounded in the Transformer's intrinsic representational geometry rather than in post-training procedures. We further propose the Cognitive State Quadrant, a dual-layer diagnostic framework that combines Layer −2 dissonance scores with Layer −1 token entropy to classify each generated token into four cognitive states—Reliable Knowledge, Epistemic Conflict, Nonsense, and Knowledge Boundary—providing a principled, representation-level approach to hallucination decomposition.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
