Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ International Journa...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
ZENODO
Article . 2026
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
versions View all 5 versions
addClaim

EmbedGuard: Cross-Layer Detection and Provenance Attestation for Adversarial Embedding Attacks in RAG Systems

Authors: Beshane, Neeraj Kumar Singh;

EmbedGuard: Cross-Layer Detection and Provenance Attestation for Adversarial Embedding Attacks in RAG Systems

Abstract

Embedding-based Retrieval-Augmented Generation (RAG) systems are critical infrastructure for production AI applications, yet they remain vulnerable to embedding space poisoning attacks that achieve disproportionate success with minimal payloads (<1% corpus contamination, resulting in>80% attack success rates). Current single-layer defense approaches optimize for high-amplitude signals in narrow-dimensional subspaces, making them systematically vulnerable to coordinated cross-layer attacks that distribute adversarial signals across architectural layers. EmbedGuard is an adaptive, cross-layer detection framework integrating hardware-backed cryptographic attestation with statistical anomaly detection across four RAG architectural layers: prompt layer injection detection, embedding layer hardware attestation via Trusted Execution Environments (TEEs), retrieval layer distributional analysis, and output layer consistency verification. The framework employs efficient techniques, including incremental Principal Component Analysis and Kullback-Leibler divergence metrics, to detect subtle, coordinated attacks while maintaining production-grade latencies. Evaluation of a production-scale system (500,000 embeddings, 47,000 queries) demonstrates a 94.7% detection rate for optimization-based attacks and 89.3% for adaptive attacks, with a 3.2% false positive rate and a 51ms mean latency overhead. Ablation studies quantify an 18.4 percentage point improvement from cross-layer correlation over the best single-layer approach. The framework operates in three deployment modes—passive logging, gated human review, and active automatic remediation—enabling deployment across diverse organizational contexts and security requirements while protecting against adversarial embedding manipulation.

Keywords

machine learning security, Cross-Layer Attack Detection, Embedding Space Poisoning, Trusted Execution Environments, retrieval-augmented generation, prompt injection, RAG security, trusted execution environment, Cryptographic Provenance Attestation, embedding poisoning

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average