
Token streams are a human-oriented interface that can obscure generation dynamics and encourage brittle analyses (e.g., relying on chain-of-thought text). We introduce an "EEG-like" telemetry layer for autoregressive decoding that records lightweight internal signals during generation - uncertainty, surprisal, distribution shift, and sparse layer summaries - yielding real-time traces of model state evolution without parsing chain-of-thought text. Across three model families and three task types (27 runs = 3 models x 3 tasks x 3 seeds), we find that telemetry signatures vary strongly across models and tasks, and that early-window uncertainty can predict failures above random on labeled tasks (entropy AUC 0.61-0.74 on GSM8K and 0.35-0.75 on TriviaQA). As an application demo, we show how telemetry can gate simple downstream policies (accept / retry / route) on a Llama-8B -> Qwen-32B (4-bit) pair, improving accuracy by up to +10.5 points on GSM8K (route-only; 1.42x cost proxy) and +1.5 points on TriviaQA (cascade; 2.81x cost proxy). We release a reproducible pipeline, canonical benchmarks, and visualization tools. We view this work as a first step toward systems that progressively reduce reliance on human token interfaces.
token-level analysis, uncertainty estimation, telemetry, internal activations, large language models (LLM), hidden states, inference-time monitoring
token-level analysis, uncertainty estimation, telemetry, internal activations, large language models (LLM), hidden states, inference-time monitoring
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
