
We track behavioral subspaces throughout rho-guided SFT on Qwen 2.5-7B-Instruct under three protection conditions (gamma=0.03, gamma=0.10, anneal). The central finding is that surgery operates by compression, not rotation: pairwise Grassmann angles remain within 2-3 degrees of baseline across all conditions and layers, while effective dimensionality of target behaviors compresses 40-60% within the first 25 training steps. The protection strength gamma controls collateral compression: at gamma=0.10, factual dimensionality collapses to 1 at all layers, explaining weaker behavioral outcomes despite stronger protection. Compression is irreversible once established.
behavioral compression, language model interpretability, subspace geometry, behavioral alignment, supervised fine-tuning, Grassmann manifold
behavioral compression, language model interpretability, subspace geometry, behavioral alignment, supervised fine-tuning, Grassmann manifold
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
