
This archive presents five working papers on context compression failure modes in large language models.The central finding is the validity mirage: naive context compression can preserve surface-level answer correctness while silently substituting the governing hypothesis, causing a model to answer confidently about the wrong task. We develop a tropical semiring algebra (max-plus over ℝ ∪ {−∞}) for measuring context health under compression, and show that structurally guarded retention policies eliminate pivot drift where recency-based baselines fail completely.Empirical validation spans five open-weight model architectures (Llama 3.1 8B, Mistral 7B v0.3, Gemma 2 9B, Phi-3 Medium 14B, Qwen 2.5 14B) across 11,400+ boundary instances and 4,200+ streaming trials, with additional testing against 13 real incident graphs (12 NTSB aviation investigations and the Knight Capital 2012 trading failure). A production MCP server implementation is available separately.Included papers: Paper 00: Continuous Control and Structural Regularization in Multi-Agent Narrative ExtractionPaper 01: Absorbing States in Greedy SearchPaper 02: Streaming Oscillation Traps in Endogenous-Pivot Sequential ExtractionPaper 03: The Validity Mirage: Context Algebra for Endogenous Semantics under Memory CompressionPaper I: Tropical Algebra of Endogenous-Pivot Semantics Reproducible validation artifacts and benchmark outputs are included in the results/ directory. All papers are working paper first drafts distributed under CC-BY 4.0.
validity-mirage, large-language-models, ai-safety, context-compression
validity-mirage, large-language-models, ai-safety, context-compression
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
