OFF-MANIFOLD BY CONSTRUCTION: INTERMEDIATE-LAYER ADAPTERS IN FROZEN AR DECODERS

A common approach to adapting frozen autoregressive transformers without modifying their weights is to perturb hidden states at intermediate layers, for example, via element-wise modulation or residual bottlenecks. We prove that any real-analytic perturbation satisfying a natural non-degeneracy assumption produces hidden states that, with probability 1 over initialization, lie outside the model’s natural reachable set (Theorem 1). For post-FFN element-wise adapters, we prove a stronger structural result: for almost every base model, no non-zero adapter—trained or untrained—can map all prompts back onto the natural reachable set (Theorem 2). Because subsequent layers are real-analytic maps, this off-manifold deviation propagates forward rather than being absorbed, shifting the output token distribution and, in cascaded architectures, breaking the coupled training graph of downstream decoders. We characterize the empirical behavior on Qwen3-TTS 1.7B.

Found an issue? Give us feedback