Gold-Standard AGI: Outer AGI Superalignment

In order to maximise the net benefit of AGI (Artificial General Intelligence, and, in particular, agentic superintelligent AGI) for all humanity, without favouring any subset thereof, we imagine a Gold-Standard AGI that is maximally-aligned and maximally-validated. The first of these properties --- alignment --- is traditionally decomposed into outer alignment (how do we define a final goal FG_G that correctly states what we want?), and inner alignment (how do we build an agent G that forever pursues FG_G as intended?) This paper presents a complete, foundational, and self-contained theory of AGI, culminating in an implementation-neutral solution to the outer AGI alignment problem in the case that G is superintelligent (hence "superalignment"). Given the AGI alignment problem's profound relevance to AGI governance, we adopt a pedagogic style throughout, in order that the paper might be accessible to less technical readers such as AGI policymakers.

Found an issue? Give us feedback