The Readout Regime: A Normal Form for Final-Residual Control of Frozen Transformers — and Its Capacity Limits

Peterson, Nathan

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Preprint

Data sources: ZENODO

The Readout Regime: A Normal Form for Final-Residual Control of Frozen Transformers — and Its Capacity Limits

descriptionPublicationkeyboard_double_arrow_right Preprint Under curation English Publisher:Zenodo

Authors: Peterson, Nathan;

doi: 10.5281/zenodo.20562890

The Readout Regime: A Normal Form for Final-Residual Control of Frozen Transformers — and Its Capacity Limits

- Summary

Abstract

Inference-time interventions on a frozen transformer — steering behavior, injecting facts, suppressing outputs — are not interchangeable: where an additive intervention acts splits them into two regimes of sharply different expressive power, and we give the exact theory of one. An intervention that writes into the final residual stream (the unembedding/readout space; installing a scaled row of the output-projection matrix lm_head is the exactly-characterizable case) induces on the next-token logits, for every input, the transform 𝑧 ↦ 𝑠(𝑥) ⋅ 𝑧 + 𝑐(𝑥): a ranking-preserving scalar temperature 𝑠(𝑥) > 0 plus a re-ranking bias 𝑐(𝑥) confined to a fixed, low-dimensional set of directions chosen before any input is seen (T1; verified against direct forward-pass computation to ≈ 5 × 10−6). The result that matters is a structure theorem (T2): the input selects only a point in that fixed set and a temperature, so the reachable re-ranking directions, over all inputs, have affine dimension at most the installed-slot count. In plain terms: a readout install can re-weight and re-rank the options the model already has, but cannot synthesize a new answer direction or compute a hidden routing variable the addressing query does not already expose. We prove the readout regime’s confinement and cite — not prove — evidence that the representation regime is not so confined; that direct measurement is the main open item. The corollaries, carefully bounded: a bounded readout install is not a hard override of a peaked prior, can only tip a decision the context has already scaffolded near-balanced, and cannot compute a hidden intermediate. Capacity has two faces, both empirical: across key-disjoint decisions installs compose bit- exactly at scale (tens of modules, thousands of facts, Δ = 0); within one decision the readout is winner-take-all (≈ 2 targets co-winnable, against an output projection of entropy-effective rank ≈ 918). The control reading: tip a propensity in the readout regime — it is auditable, a removable bias — but place any hard guarantee in a deterministic override outside the model. The contribution is the boundary, stated precisely.

Found an issue? Give us feedback