From Artifacts to Risk: Auditing Instruction Surfaces in Agent Systems

Agentic systems increasingly rely on persistent instruction artifacts, tool integrations, and repository-level configuration that shape behavior beyond individual prompts. Prior work has established prompt injection, indirect instruction attacks, tool poisoning, and agent hijacking as practical security concerns. Less attention, however, has been given to the repository layer as a persistent and auditable source of agent behavior. This paper presents a bottom-up, artifact-centric audit of instruction surfaces in agent systems. We analyze a purposive corpus of 509 instruction-rich repositories containing agent guidance files, skills, plugin manifests, and Model Context Protocol (MCP) related artifacts. The scan produced 4,882 medium-or-higher raw findings and 4,637 clustered issue instances. The contribution is not a new prompt-injection benchmark or a replacement for existing scanners. Instead, this study integrates heterogeneous signature sources, applies them to real repositories, correlates raw detections into artifact-level issue instances, and maps the resulting evidence to an ASAMM-aligned agent-security interpretation layer. We explicitly treat detector outputs as candidate evidence rather than proof of exploitability. The paper positions instruction surfaces as repository-level control-plane artifacts and argues that agent security practice needs artifact-level auditing alongside runtime testing and defense.

Found an issue? Give us feedback