
This paper analyzes model collapse (Shumailov et al., 2024) through the lens of information-theoretic closure rather than simple optimization failure. We argue that self-referential training regimes optimize for Information IN (internal statistical coherence) while systematically decoupling from Information ABOUT (external functional regularities) (Kolchinsky & Wolpert, 2018). Drawing on non-equilibrium thermodynamics (Prigogine, 1977), Ashby’s Law of Requisite Variety (Ashby, 1956), and Pearl’s causal epistemology (Pearl, 2009), we demonstrate that training on synthetic data violates the condition of epistemic independence, creating a feedback loop that screens off environmental variation. We formalize this pathology through the viability condition E(t) ≤ C(t), proving that a system’s corrective information input (C) must continuously exceed its rate of internal entropic drift (E) to maintain semantic grounding.
Synthetic Data, Model Collapse, Causal Inference, Large Language Models, AI safety, Generative AI, Epistemic Independence, Information Theory, Thermodynamics, Cybernetics, Alignment
Synthetic Data, Model Collapse, Causal Inference, Large Language Models, AI safety, Generative AI, Epistemic Independence, Information Theory, Thermodynamics, Cybernetics, Alignment
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
