Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint . 2026
Data sources: ZENODO
addClaim

DeepDrift/ODD Kinetic Diagnosis of Representations in Deep Neural Networks

Authors: Evtushenko, Alexey;

DeepDrift/ODD Kinetic Diagnosis of Representations in Deep Neural Networks

Abstract

This work presents a self-contained study on fail-fast monitoring of neural networks via hidden-state dynamics, extending and substantially reframing an earlier exploratory preprint on hidden-state trajectories. We introduce Semantic Velocity — a kinetic measure of representation drift in latent space — and show that it serves as a leading indicator of model unreliability, preceding observable failures such as accuracy drops, hallucinations, policy collapse, or reward hacking. Unlike confidence- or output-based signals, the proposed approach operates on internal model dynamics and is therefore agnostic to task labels and downstream objectives. The method is evaluated across a broad range of settings, including: large language models (OOD prompts, jailbreak attempts), vision transformers under corruption and distribution shift, reinforcement learning agents under policy destabilization, production-oriented constraints (latency, overhead, sparse sampling). Empirically, Semantic Velocity demonstrates strong early-warning capability (6–12 steps lead time), robust separation between nominal and failure regimes, and low computational overhead (<0.5%), making it suitable for real-time deployment. Notably, jailbreak and adversarial behaviors manifest as internal conflict signatures, revealing tension between pretraining and alignment objectives before surface-level violations occur. This paper positions hidden-state dynamics as a practical and interpretable foundation for out-of-distribution detection, reliability monitoring, and AI safety infrastructure, bridging theoretical intuition with production-scale feasibility. The study builds upon prior conceptual work by the author, but constitutes a substantially new and independent contribution, introducing a new monitoring paradigm, expanded empirical validation, and a system-level perspective on neural network reliability.

Keywords

Out-of-Distribution Detection, LLM, Large Language Models, Adversarial Robustness, Reinforcement Learning Reliability, Hallucination Detection, AISafety, OOD, Jailbreak Detection

Powered by OpenAIRE graph
Found an issue? Give us feedback