
As Large Language Models (LLMs) increasingly saturate the internet with synthetic content, the risk of future modelsbeing trained on generated data grows exponentially. This paper introduces the “Ainex Law,” a mathematical principle definingthe upper bound of semantic integrity in recursive self-learning systems. Through rigorous experimentation using a GPT-2 architec-ture within a closed feedback loop, we empirically demonstrate that without external human-grounded data, the model’s semanticspace—measured via the Convex Hull Volume (Vhull) of latent embeddings—suffers a deterministic decay. We observe a 66%reduction in semantic diversity within 20 generations, accompanied by a sharp increase in Centroid Drift (μAI ) away from thehuman baseline. Our findings suggest that “Model Collapse” is not merely a quality degradation but a geometric inevitability akinto thermodynamic entropy. We propose the Ainex Score (A) as a standardized metric to quantify this decay.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
