The Geometric Theory of Inference: Synthesis, Prediction Audit, and Causal Evidence from Five Papers and Thirteen Experimental Phases

Geometric Contrast Imaging (GCI) is a measurement framework that classifies the computational geometry at each layer of a neural network's forward pass — hyperbolic (hierarchical), Euclidean (flat), or spherical (cyclic) — by analyzing patterns in how attention heads route information. The resulting "geometric trace" reveals that inference is not a uniform process: models consistently organize their layers into distinct geometric phases, with structured transitions between them. This paper synthesizes the GCI research program (five prior papers, thirteen experimental phases) and reports four new experimental phases that test the framework's predictions, together with a formal audit of every testable claim the program has generated. Phase 12 (Scaling Predictions) tests three quantitative predictions from the statistical mechanics framework about how geometric properties should scale with model size. One fails outright (transition sharpness scales in the wrong direction), and the remaining two are directionally correct but not statistically significant — eight models provide insufficient statistical power for cross-model scaling laws. Phase 13 (Metastability Validation) tests six predictions from the free energy framework about metastable dynamics at phase boundaries. Five of six steps pass, but the dwell-difficulty prediction inverts: harder inputs commit to a geometric mode faster, not slower. The model resolves what kind of processing to apply quickly for difficult inputs, even though it may not resolve the answer quickly. Phase 10 (Causal Interventions) provides the first causal evidence that geometric phases matter for model behavior. At layers where inputs disagree about geometric classification, geometry predicts ablation sensitivity (p = 0.033). Category-specific effects are significant (p = 0.003): ablating H-phase layers disproportionately damages hierarchical inputs. However, geometric phases do not concentrate overall layer importance beyond random groupings — importance is position-driven (first and last blocks dominate). The key new finding is a type/magnitude dissociation: geometry determines what kind of computation a layer performs but not how much it matters. Phase 11 (Vocabulary Completeness) establishes that the three-class geometric vocabulary (H/E/S) is adequate for small models (82.7% high-confidence for GPT-2 Small) but inadequate at scale (9.4% for Llama 3 8B). The continuous routing weights, not the discrete labels, are the reliable signal for large models. Prediction Audit. A formal audit scores all 53 testable predictions across Papers 1–3 and 5: 12 confirmed (23%), 13 partially confirmed (25%), 16 revised (30%), 4 refuted (8%), and 8 untested (15%). The framework's qualitative claims — structured phases, emergent geometry, entropy-speed coupling, lawful dynamics — survive scrutiny. Its quantitative predictions — specific exponents, functional forms, cross-model scaling — are fragile. Failures and inversions are reported with the same rigor as confirmations. Five results survive scrutiny across eight models and five architecture families: structured phase cycling, emergent geometry from head disagreement (r = +0.92), entropy-speed anti-correlation (r < −0.84), lawful five-force dynamics (R² = 0.80–0.91), and the type/magnitude causal dissociation. This is the sixth paper in the GCI program: instrument (Paper 1) → emergence mechanism (Paper 2) → statistical mechanics (Paper 3) → normative principle via the Free Energy Principle (Paper 4) → dynamics (Paper 5, Zenodo DOI: 10.5281/zenodo.18752720) → synthesis, new evidence, and prediction audit (this paper). The paper is self-contained and does not require the earlier papers.

Keywords

mechanistic interpretability, Neural Networks, geometric contrast imaging, transformer architecture, phase transitions, attention mechanisms, metastability, Machine Learning, neural network interpretability, geometric deep learning, Artificial Intelligence, computational geometry, causal interventions, free energy principle, prediction audit

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Related to Research communities

Knowmad Institut

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now