Empirical Evidence Of Interpretation Drift In ARC-Style Reasoning

This paper provides empirical evidence of interpretation drift in large language models using ARC-style symbolic reasoning tasks. Interpretation drift refers to instability in a system’s internal task representation under fixed inputs and instructions, leading to incompatible task ontologies even in fully observable, non-linguistic settings. Earlier work introduced interpretation drift as a theoretical explanation for reliability failures that persist despite improvements in model capability. However, governance and safety debates have continued to assume that such failures would resolve as models became more intelligent. The present work tests that assumption directly using ARC-style tasks, which the industry itself treats as a benchmark for abstraction and intelligence. Under these controlled conditions, multiple frontier models were observed to diverge in inferred task structure, including object boundaries, dimensionality, and transformation rules, prior to symbolic reasoning. These divergences cannot be explained by prompt ambiguity, sampling variance, or output inconsistency. This artifact provides empirical grounding for the interpretation drift framework introduced in: Empirical Evidence Of Interpretation Drift In Large Language Models [https://doi.org/10.5281/zenodo.18219428] The findings establish a governance-relevant boundary condition: systems that cannot maintain stable mappings between perceptual input and symbolic representation are not reliably evaluable and cannot be assigned autonomous decision-making authority in safety-critical or regulated contexts.

Keywords

AI Governance, Large Language Models, AI Instability, Interpretation Drift, AI Safety

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average