Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Research . 2026
License: CC BY
Data sources: Datacite
ZENODO
Research . 2026
License: CC BY
Data sources: Datacite
ZENODO
Research . 2026
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

Validity Mirage: Context Compression Failure Modes in LLMs

Authors: Gaffney, Jack Chaudier;

Validity Mirage: Context Compression Failure Modes in LLMs

Abstract

This archive presents five working papers on context compression failure modes in large language models.The central finding is the validity mirage: naive context compression can preserve surface-level answer correctness while silently substituting the governing hypothesis, causing a model to answer confidently about the wrong task. We develop a tropical semiring algebra (max-plus over ℝ ∪ {−∞}) for measuring context health under compression, and show that structurally guarded retention policies eliminate pivot drift where recency-based baselines fail completely.Empirical validation spans five open-weight model architectures (Llama 3.1 8B, Mistral 7B v0.3, Gemma 2 9B, Phi-3 Medium 14B, Qwen 2.5 14B) across 11,400+ boundary instances and 4,200+ streaming trials, with additional testing against 13 real incident graphs (12 NTSB aviation investigations and the Knight Capital 2012 trading failure). A production MCP server implementation is available separately.Included papers: Paper 00: Continuous Control and Structural Regularization in Multi-Agent Narrative ExtractionPaper 01: Absorbing States in Greedy SearchPaper 02: Streaming Oscillation Traps in Endogenous-Pivot Sequential ExtractionPaper 03: The Validity Mirage: Context Algebra for Endogenous Semantics under Memory CompressionPaper I: Tropical Algebra of Endogenous-Pivot Semantics Reproducible validation artifacts and benchmark outputs are included in the results/ directory. All papers are working paper first drafts distributed under CC-BY 4.0.

Keywords

validity-mirage, large-language-models, ai-safety, context-compression

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!