
AI coding agents now execute file, process, and network operations on developer hosts with the user's full token authority. We study whether LLM-assisted policy verdicts can support runtime enforcement for these agents under syscall-blocking latency constraints. We describe a Windows runtime-guardrail architecture with kernel-mode hooks for file-system, network, and process events plus a userspace policy pipeline that routes ambiguous events to Claude Haiku 4.5 via AWS Bedrock and matches a small pre-registered pattern set synchronously for unambiguous high-risk actions. In this paper we measure only the userspace verdict pipeline of the prototype, under a user-mode-fallback configuration in which the kernel drivers were not loaded on the test host. The measurement covers 1,247 events spanning 1,000 scenarios drawn from a five-category threat taxonomy for AI coding agents. We report: (E1) Bedrock round-trip and event-to-verdict latency CDFs at the prototype-default batching configuration, (E6) the impact of a synchronous fast-path that bypasses the LLM for unambiguous cases, (E5) a cost-vs-latency sweep across six batching configurations, and (E4) the geographic latency floor across three Bedrock regions. Two findings dominate. First, an LLM-only critical path is fragile under typical batching: at the prototype-default configuration (BEDROCK_MAX_BATCH=10, BEDROCK_BATCH_DELAY=2.0 s), event-to-verdict p99 reaches 7,741 ms and 65% of Bedrock-routed events (excluding synchronous fast-path hits) exceed a 4 s userspace timeout in our measurements; counted across all events the rate is 56%. A re-tuned configuration with a shorter batch window (d=0.5 s) reduces the Bedrock-routed rate to 7%, at a 71% increase in API calls. Second, where a synchronous fast-path is implemented, architectural placement matters as much as the pattern set: the same six pre-registered patterns produce p99 fast-path latency of 3,617 ms when executed inside the single-threaded LLM reviewer, versus 1.00 ms when executed synchronously on the publishing thread — a 3,617-fold reduction with no change to the patterns or workload. The fast-path's coverage in our pilot pattern set is small (about 14% of events), so the architectural finding is a latency claim, not a coverage claim. The results support a hybrid design: deterministic synchronous controls for unambiguous high-risk actions, with the LLM reserved for slower semantic review on a path that is not load-bearing for the intended kernel-mode deadline.
LLM policy enforcement, AI Agent Security, runtime verification, inline reference monitor, syscall interception, hybrid policy architecture
LLM policy enforcement, AI Agent Security, runtime verification, inline reference monitor, syscall interception, hybrid policy architecture
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
