Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Report
Data sources: ZENODO
addClaim

Decision Persistence Benchmarks for Autonomous AI Systems

Authors: Bostick, Devin;

Decision Persistence Benchmarks for Autonomous AI Systems

Abstract

Abstract Organizations increasingly rely on AI systems to recommend, approve, classify, prioritize, route, and execute decisions. Existing governance approaches focus primarily on model behavior, runtime controls, authorization, audit logging, and post-hoc explanation. While necessary, these controls do not fully address a growing operational challenge: can an AI-mediated decision be reconstructed, challenged, corrected, and independently verified after conditions change? This paper introduces Decision Persistence Benchmarks, a framework for evaluating whether consequential AI-mediated decisions remain governable across time, transformation, correction, and external verification. Eight benchmark categories are proposed: Decision Replay, Evidence Admissibility, Identity Preservation, Authority Closure, Drift Detection, Retraction Propagation, Runtime-Independent Verification, and Federation Readiness. The framework distinguishes proof of occurrence from proof of governance and introduces a Decision Governance Index (DGI) for comparative evaluation. The benchmarks are implementation-agnostic and are intended for governance assessment, procurement review, architecture evaluation, compliance analysis, and the design of high-trust autonomous systems.

Powered by OpenAIRE graph
Found an issue? Give us feedback