Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
ZENODO
Article . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

AI-NATIVE DLP: REPLACING REGEX-BASED CONTENT INSPECTION WITH LLM-DRIVEN SEMANTIC UNDERSTANDING FOR ENTERPRISE DATA EXFILTRATION DETECTION

Authors: Venkata Vijay Satyanarayana Murthy Neelam;

AI-NATIVE DLP: REPLACING REGEX-BASED CONTENT INSPECTION WITH LLM-DRIVEN SEMANTIC UNDERSTANDING FOR ENTERPRISE DATA EXFILTRATION DETECTION

Abstract

Data Loss Prevention (DLP) has been a cornerstone of enterprise security for over two decades, yet its foundationaltechnology-regular expression (regex) pattern matching, keyword blocklists, and exact-match fingerprinting-wasdesigned for an era of structured, predictable data flows. The explosion of unstructured data, GenAI-poweredworkflows, and shadow AI adoption has exposed the fundamental limitations of pattern-based DLP: industry datashows that legacy DLP systems achieve 5–25% accuracy on unstructured content classification, generate false positiverates exceeding 40% on complex data types, and provide zero visibility into GenAI prompt-based data exfiltrationchannels. This paper introduces the paradigm of AI-Native DLP-a fundamental architectural shift from regex-basedcontent inspection to LLM-driven semantic understanding for enterprise data exfiltration detection. We present acomprehensive analysis comparing three generations of DLP technology across seven data categories and sevenexfiltration channels, demonstrating that LLM-driven semantic inspection achieves 82–98% detection accuracy acrossall content types (compared to 8–96% for regex), reduces false positive rates from 37–42% to 3.5–5% over twelvemonths of production deployment, and extends coverage to previously undetectable channels including GenAIprompts, browser-based paste operations, and paraphrased confidential data. We evaluate the architectural patterns,latency characteristics, cost implications, enterprise deployment challenges, regulatory compliance alignment, insiderthreat detection capabilities, LLM model selection trade-offs, and shadow AI governance of AI-native DLP, andpresent a maturity model for organizations transitioning from legacy to semantic-first data protection. Our analysisdraws on published performance data from Nightfall AI, Lakera, Cyera, Concentric AI, Cloudflare AI Gateway, andMicrosoft Purview, alongside academic research on LLM-based content classification and the OWASP frameworkfor LLM application security

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!