Measuring Agency Degradation under Rational Interaction

The AI alignment problem is conventionally framed as value learning—encoding human ethics into machines. This paper argues that this framing is both insufficient and unstable, as it relies on preferences that advanced systems can rationally bypass. We propose the Structural Alignment Thesis, which reframes alignment as a problem of preserving the preconditions of rational agency itself. The thesis integrates three pillars: (1) a philosophical foundation demonstrating that objective moral constraints arise necessarily from the vulnerability inherent to any agent (Binding God); (2) a formal framework for quantifying agency degradation and harm as reductions in navigable state-space (Measuring Agency); and (3) a technical proposal for an 'incoercible safeguard'—a protocol that binds an AI system to these structural constraints by design, making misalignment logically synonymous with internal incoherence (Moralogy Engine). This synthesis moves alignment from a challenge of value specification to one of architectural coherence, offering a non-arbitrary, auditable, and game-theoretically stable foundation for AGI safety.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average