Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2026
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2026
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other ORP type . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Other ORP type . 2026
License: CC BY
Data sources: Datacite
ZENODO
Other ORP type . 2026
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

Empirical Evidence Of Interpretation Drift In ARC-Style Reasoning

Authors: Nguyen, Elin;

Empirical Evidence Of Interpretation Drift In ARC-Style Reasoning

Abstract

This paper provides empirical evidence of interpretation drift in large language models using ARC-style symbolic reasoning tasks. Interpretation drift refers to instability in a system’s internal task representation under fixed inputs and instructions, leading to incompatible task ontologies even in fully observable, non-linguistic settings. Earlier work introduced interpretation drift as a theoretical explanation for reliability failures that persist despite improvements in model capability. However, governance and safety debates have continued to assume that such failures would resolve as models became more intelligent. The present work tests that assumption directly using ARC-style tasks, which the industry itself treats as a benchmark for abstraction and intelligence. Under these controlled conditions, multiple frontier models were observed to diverge in inferred task structure, including object boundaries, dimensionality, and transformation rules, prior to symbolic reasoning. These divergences cannot be explained by prompt ambiguity, sampling variance, or output inconsistency. This artifact provides empirical grounding for the interpretation drift framework introduced in: Empirical Evidence Of Interpretation Drift In Large Language Models [https://doi.org/10.5281/zenodo.18219428] The findings establish a governance-relevant boundary condition: systems that cannot maintain stable mappings between perceptual input and symbolic representation are not reliably evaluable and cannot be assigned autonomous decision-making authority in safety-critical or regulated contexts.

Keywords

AI Governance, Large Language Models, AI Instability, Interpretation Drift, AI Safety

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average