Reliable Intent Identification Under Adversarial Conditions for Low-Latency LLM Authorization

Goyal, Manish; Sinha, Nitesh

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Preprint

Data sources: ZENODO

Reliable Intent Identification Under Adversarial Conditions for Low-Latency LLM Authorization

descriptionPublicationkeyboard_double_arrow_right Preprint Under curation English Publisher:zenodo

Authors: Goyal, Manish; Sinha, Nitesh;

doi: 10.5281/zenodo.20567007

Reliable Intent Identification Under Adversarial Conditions for Low-Latency LLM Authorization

- Summary

Abstract

Deploying LLMs as enterprise agents requires authorizing not just who is making a request, but what the request intends to do. Intent, however, is difficult to classify reliably: adversaries actively manipulate input text to bypass semantic classifiers, and the latency constraints of real-time systems rule out LLM-based approaches. We present research into reliable intent identification for LLM authorization under these constraints. Our central finding is that adversarial inputs do not merely fail to be classified correctly - they are classified with artificially high confidence, creating a failure mode that binary accuracy metrics do not capture. We describe an approach that treats obfuscation severity as a measurable signal and uses it to modulate classification confidence, with a principled lower bound that prevents adversarial inputs from reaching the auto-allow zone. Evaluated against 250 labeled prompts across 37 intent classes - including 70 adversarial inputs - in a production-grade prototype BFSI deployment, our approach achieves 94% intent classification accuracy at P95 latency of 100ms - with no external LLM call at inference time. A Human-in-the-Loop (HITL) mechanism converts review decisions into training signals, enabling accuracy to improve over time without expanding the attack surface for adversarial learning.

Found an issue? Give us feedback