Gold-Standard AGI: Outer ASI Superalignment

In order to maximise the net benefit of AGI (Artificial General Intelligence, and, in particular, agentic superintelligent AGI, a.k.a. Artificial Superintelligence, or ASI) for all humanity, without favouring any subset, we imagine a Gold-Standard AGI that is maximally-aligned and maximally-validated. The first of these properties --- alignment --- is traditionally decomposed into outer alignment (how do we define a final goal $\mathbf{FG}_G$ that correctly states what we want?), and inner alignment (how do we build an agent $G$ that forever pursues $\mathbf{FG}_G$ as intended?) This paper presents a complete, foundational, and self-contained theory of AGI, including a novel continuous learning algorithm (UATL) for agentic AGI, and culminating in an implementation-neutral solution to the outer AGI alignment problem in the case that $G$ is superintelligent (hence "superalignment"). The net effect of our superalignment solution (the $\mathbf{TTQ}$+$\mathbf{OAP}$ combination) is to reduce the (seemingly impossible) problem of building a maximally-aligned agentic superintelligence $S$ to the (much easier) problem of building an $\mathbf{OAP}$-compliant non-agentic superintelligence $S^-$. Given the ASI superalignment problem's profound relevance to AGI governance, we adopt a pedagogic style throughout, in order that the paper might be accessible to less technical readers such as AGI policymakers.

Found an issue? Give us feedback