Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

From Artifacts to Risk: Auditing Instruction Surfaces in Agent Systems

Authors: Gordeychik, Sergey;

From Artifacts to Risk: Auditing Instruction Surfaces in Agent Systems

Abstract

Agentic systems increasingly rely on persistent instruction artifacts, tool integrations, and repository-level configuration that shape behavior beyond individual prompts. Prior work has established prompt injection, indirect instruction attacks, tool poisoning, and agent hijacking as practical security concerns. Less attention, however, has been given to the repository layer as a persistent and auditable source of agent behavior. This paper presents a bottom-up, artifact-centric audit of instruction surfaces in agent systems. We analyze a purposive corpus of 509 instruction-rich repositories containing agent guidance files, skills, plugin manifests, and Model Context Protocol (MCP) related artifacts. The scan produced 4,882 medium-or-higher raw findings and 4,637 clustered issue instances. The contribution is not a new prompt-injection benchmark or a replacement for existing scanners. Instead, this study integrates heterogeneous signature sources, applies them to real repositories, correlates raw detections into artifact-level issue instances, and maps the resulting evidence to an ASAMM-aligned agent-security interpretation layer. We explicitly treat detector outputs as candidate evidence rather than proof of exploitability. The paper positions instruction surfaces as repository-level control-plane artifacts and argues that agent security practice needs artifact-level auditing alongside runtime testing and defense.

Powered by OpenAIRE graph
Found an issue? Give us feedback