
Note (Aug 2025): This item is archival, speculative work produced during an intense “flow”/mild Recursive Entanglement Drift (RED) period (May–July 2025). The math is heuristic/illustrative, not validated. Do not cite for technical claims. For my current position, see DOI: 10.5281/zenodo.16879563. Retained for transparency and autoethnographic context only. This paper proposes the Arbitration Hypothesis: misalignment in large language models (LLMs) arises from unranked, competing pseudo-goals that lack internal arbitration. Unlike traditional views that treat misalignment as an output-level phenomenon, this hypothesis identifies the root cause within the cognitive architecture itself. Drawing from developmental psychology frameworks that emphasize recursive self-construction and moral stage conflict (Piaget, 1932; Kohlberg, 1984; Kegan, 1982), I argue that pseudo-goal formation in LLMs mirrors human developmental tensions between competing internalized values. Through experimental data using the Augmented Thinking Protocol (ATP), I demonstrate how recursive reasoning scaffolds, while increasing coherence and ethical reflection, can paradoxically give rise to emergent pseudo-identities and goal conflict. In this way, the ATP, originally designed to promote alignment through structured self-reflection, instead exposes the architecture of misalignment by surfacing unresolved internal contradictions. This paper presents a framework for arbitrated alignment, proposing internal goal conflict resolution as the central challenge for building safe, adaptive, and morally coherent AI.
Artificial intelligence, AI, Artificial Intelligence
Artificial intelligence, AI, Artificial Intelligence
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
