Decision Persistence Benchmarks for Autonomous AI Systems

Abstract Organizations increasingly rely on AI systems to recommend, approve, classify, prioritize, route, and execute decisions. Existing governance approaches focus primarily on model behavior, runtime controls, authorization, audit logging, and post-hoc explanation. While necessary, these controls do not fully address a growing operational challenge: can an AI-mediated decision be reconstructed, challenged, corrected, and independently verified after conditions change? This paper introduces Decision Persistence Benchmarks, a framework for evaluating whether consequential AI-mediated decisions remain governable across time, transformation, correction, and external verification. Eight benchmark categories are proposed: Decision Replay, Evidence Admissibility, Identity Preservation, Authority Closure, Drift Detection, Retraction Propagation, Runtime-Independent Verification, and Federation Readiness. The framework distinguishes proof of occurrence from proof of governance and introduces a Decision Governance Index (DGI) for comparative evaluation. The benchmarks are implementation-agnostic and are intended for governance assessment, procurement review, architecture evaluation, compliance analysis, and the design of high-trust autonomous systems.

Found an issue? Give us feedback