EmbedGuard: Cross-Layer Detection and Provenance Attestation for Adversarial Embedding Attacks in RAG Systems

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 05 Feb 2026Publisher:Iskender AKKURTJournal:International Journal of Computational and Experimental Science and Engineering, volume 12 (eissn: 2149-9144,

Copyright policy )

Authors: Beshane, Neeraj Kumar Singh;

doi: 10.22399/ijcesen.4869 , 10.5281/zenodo.18364919 , 10.5281/zenodo.18522877 , 10.5281/zenodo.18522876 , 10.5281/zenodo.18364920

EmbedGuard: Cross-Layer Detection and Provenance Attestation for Adversarial Embedding Attacks in RAG Systems

- Summary
- Subjects
- Metrics

Abstract

Embedding-based Retrieval-Augmented Generation (RAG) systems are critical infrastructure for production AI applications, yet they remain vulnerable to embedding space poisoning attacks that achieve disproportionate success with minimal payloads (<1% corpus contamination, resulting in>80% attack success rates). Current single-layer defense approaches optimize for high-amplitude signals in narrow-dimensional subspaces, making them systematically vulnerable to coordinated cross-layer attacks that distribute adversarial signals across architectural layers. EmbedGuard is an adaptive, cross-layer detection framework integrating hardware-backed cryptographic attestation with statistical anomaly detection across four RAG architectural layers: prompt layer injection detection, embedding layer hardware attestation via Trusted Execution Environments (TEEs), retrieval layer distributional analysis, and output layer consistency verification. The framework employs efficient techniques, including incremental Principal Component Analysis and Kullback-Leibler divergence metrics, to detect subtle, coordinated attacks while maintaining production-grade latencies. Evaluation of a production-scale system (500,000 embeddings, 47,000 queries) demonstrates a 94.7% detection rate for optimization-based attacks and 89.3% for adaptive attacks, with a 3.2% false positive rate and a 51ms mean latency overhead. Ablation studies quantify an 18.4 percentage point improvement from cross-layer correlation over the best single-layer approach. The framework operates in three deployment modes—passive logging, gated human review, and active automatic remediation—enabling deployment across diverse organizational contexts and security requirements while protecting against adversarial embedding manipulation.

Keywords

machine learning security, Cross-Layer Attack Detection, Embedding Space Poisoning, Trusted Execution Environments, retrieval-augmented generation, prompt injection, RAG security, trusted execution environment, Cryptographic Provenance Attestation, embedding poisoning

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average