Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

Forensic Provenance for LLM Deployments: Real-Time Activation Watermarking for Legal Non-Repudiation

Authors: Vargas Altalaguerri, Jose Joaquín;

Forensic Provenance for LLM Deployments: Real-Time Activation Watermarking for Legal Non-Repudiation

Abstract

A white-box activation-watermarking system for forensic provenance and legal non-repudiation of LLM generations: each generation is tagged in the residual stream with a per-session payload using key-derived sign codes, at an amplitude that leaves the text unchanged (KL≈3e-4). From the provider's activation logs the payload is recovered and a binomial test yields a legal-grade provenance statement (attribution) or its absence (exculpation). The watermark is recoverable from activations, not text: the non-linear unembedding scrambles it before the logits—the same physics that kills a copyright watermark makes the forensic one sound. The appendix annexes the supporting study: engineered CDMA superposition survives only in the linear channel or with a trained de-multiplexer and collapses at every untrained non-linear readout (shown across six applications); per-token surprise predicts multiplexability and language's per-token demand collapses the capacity to K≈2. Code and experiments: https://github.com/ttzrs/neural-cdma

Powered by OpenAIRE graph
Found an issue? Give us feedback