Layered Provenance Failure: How Supply-Chain Attacks on LLM Watermarking, Billing Attestation, and Inference Fingerprinting Expose a Shared Audit-Layer Vulnerability

Saluca Agentic AI Research Team

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Research

Data sources: ZENODO

Layered Provenance Failure: How Supply-Chain Attacks on LLM Watermarking, Billing Attestation, and Inference Fingerprinting Expose a Shared Audit-Layer Vulnerability

descriptionPublicationkeyboard_double_arrow_right Research Under curationPublisher:Zenodo

Authors: Saluca Agentic AI Research Team;

doi: 10.5281/zenodo.20540641

Layered Provenance Failure: How Supply-Chain Attacks on LLM Watermarking, Billing Attestation, and Inference Fingerprinting Expose a Shared Audit-Layer Vulnerability

- Summary

Abstract

A recurring structural failure emerges across three distinct domains of deployed large language model (LLM) infrastructure: cryptographic watermarking, per-token billing, and inference-system fingerprinting. In each case, the security guarantee depends on an audit artifact whose provenance is controlled entirely by the party being audited. We call this the **audit-layer provenance problem**: the entity whose behavior is under scrutiny also controls the evidence chain that the auditor must trust. This paper synthesizes recent findings from cs.CR to argue that this shared failure mode is not coincidental — it arises from a common architectural pattern in which LLM deployments treat the serving layer as a black box and therefore must rely on provider-supplied signals for any post-hoc verification. Specifically, we draw on: a supply-chain attack on LLM watermarking PRNG integrity [corpus:arxiv:2605.28632v1]; a systematic analysis of token-billing inflation enabled by the same trust paradox [corpus:arxiv:2605.30040v1]; a fingerprinting method that reveals inference-system components through observable output deviations [corpus:arxiv:2605.29963v1] (as a diagnostic complement); a benchmark of autonomous LLM penetration-testing consistency that reveals how provider-side opacity compounds attacker advantage [corpus:arxiv:2605.30096v1]; and a position paper on "secret alignment" brittleness that generalizes the failure mode to trigger-behavior attestation [corpus:arxiv:2605.28597v1]. We further draw on a LoRA backdoor characterization [corpus:arxiv:2605.30189v1] as a weakly-connected addendum illustrating how the same provenance gap extends to distributed model artifacts. The falsification path is concrete: if a trusted-execution-environment (TEE) attestation mechanism were deployed at the serving layer, the billing inflation attack [corpus:arxiv:2605.30040v1] and the PRNG hijacking attack [corpus:arxiv:2605.28632v1] would both lose their necessary condition (provider-controlled evidence). Measuring whether TEE attestation eliminates the audit gap — or merely shifts it to the attestation-key supply chain — is the test that would overturn the central claim. This is a heuristic reading, not a derivation: the three domains share a structural analogy, not a unified formal model. ---Authorship: Saluca Agentic AI Research Team (Saluca LLC). AI-drafted from arXiv preprint corpus on the date in the filename.Cited arXiv preprints: 2605.28074v1, 2605.28588v1, 2605.28597v1, 2605.28632v1, 2605.29963v1, 2605.30040v1, 2605.30096v1, 2605.30189v1, 2605.31326v1, 2606.01691v1

Found an issue? Give us feedback