
Totem is a backend-agnostic proxy for runtime behavioral integrity verification of Large Language Models deployed as decision support systems. While existing LLM security tools protect against malicious users, none verify whether the model itself has been compromised. Totem fills this gap through three mechanisms: behavioral profiling via refusal classification, salted probes with steganographic triggers to prevent selective evasion, and cryptographically signed behavioral baselines (Model Manifest) authenticated via Ed25519 digital signatures. Evaluated across three model families and two attack vectors, Totem achieves 70% must-refuse detection rate on uncensored model swaps with 0% false positives, while naive baselines detect 0% of the same swaps. Released as open-source software under the Apache 2.0 license.
runtime verification, model integrity, llm security, behavioral probing, guardrail bypass detection, decision support systems, cryptographic attestation
runtime verification, model integrity, llm security, behavioral probing, guardrail bypass detection, decision support systems, cryptographic attestation
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
