
We uncover a geometric structure universally present in transformer language models: the representation change ∆ℓ = Hℓ − Hℓ−1 between adjacent layers undergoes a characteristic dimensionality collapse at a small number of crease layers, followed by an extended plateau of nearly isometric processing. We analyze this structure through three complementary metrics—effective rank of ∆, pairwise token distance correlation across layers, and principal direction alignment of representation change vectors. On GPT-2 Small (12 layers), a single crease at layers 1–3 collapses ∆ to effective rank 1 (98% variance explained by the top singular vector), while the subsequent 7 plateau layers operate with near-perfect pairwise distance preservation (ρ = 0.962). Extending to GPT-2 Medium (24 layers), the crease consolidates to a single sharper fold at layer 3, and the plateau expands to 18 layers with improved isometry (ρ = 0.988). We demonstrate two practical applications of this geometric understanding: (1) crease-aware fine-tuning, where freezing the crease zone and training only plateau layers achieves identical domain adaptation performance to full-model fine-tuning at 77% parameter count; and (2) trainable v-rotation, a control primitive that rotates the dominant direction of the crease transformation via an SVD-constrained orthogonal matrix, achieving +62% target domain perplexity improvement while preserving base capabilities (+16% on held-out evaluation). Our findings establish a structural principle of transformer representations—that information gain is non-uniform across layers and concentrated at creases—and provide both diagnostic and interventional tools for exploiting this principle.
