Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

How Secure Are Production AI Agents? A Systematic Audit, Threat Taxonomy, and Defense Framework

Authors: Zhou, Kang;

How Secure Are Production AI Agents? A Systematic Audit, Threat Taxonomy, and Defense Framework

Abstract

AI agents now operate with unprecedented autonomy—executing code, managing infrastructure, and coordinatingwith other agents—yet the security properties of production agent systems remain poorly understood. We present, tothe best of our knowledge, the first large-scale empirical security audit of 16 open-source AI agent projects (770K+GitHub stars, 4.7M+ lines of code), yielding 87 security findings across 15 threat categories. From these findings wederive a 5-layer, 15-category threat taxonomy grounded entirely in observed vulnerabilities. Our audit reveals that81% of agents (13/16) exhibit action boundary violations, 31% (5/16) lack any runtime security mechanism, and noagent verifies MCP server responses cryptographically.We propose AgentImmune, a lightweight, zero-dependency runtime defense framework combining deterministicpattern matching (425+ rules across 15 threat categories), n-gram fuzzy matching, instruction-structure detection,style-shift analysis, keyword co-occurrence scoring, and perplexity-based anomaly detection. Evaluated on anindependent test set of 534 samples from four sources never used during development, the recommended Balancedmode attains 100% precision, 94.5% recall, and 97.2% F1 with zero false positives. On agent-specific attackscenarios derived from our audit, AgentImmune reports 85.4% F1 across 80 test cases targeting 16 agents at amedian latency of 21 ms. All data, code, and the AgentSec-16 dataset are publicly available.Keywords: AI agent security, empirical security audit, threat taxonomy, prompt injection, runtime defense,evolutionary rule synthesis, MCP security

Powered by OpenAIRE graph
Found an issue? Give us feedback