Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

What Outermost Layers Do: A Cross-Architecture Mechanistic Study of Trained Transformers

Authors: Romero Castella, Ivan Alejandro;

What Outermost Layers Do: A Cross-Architecture Mechanistic Study of Trained Transformers

Abstract

Trained transformers process language through stacks of structurally identical layers, but their layers do not behave identically. The first and last few layers appear to do something qualitatively distinct from those in between, and what exactly they do has remained less clear. We characterize the outermost layers of three pretrained models — DistilBERT, BERT, and GPT-2 — using geometric metrics, reconstructibility from context, and a combination of linear probing and causal ablation, with hypotheses pre-registered before any numbers were extracted. We find that a sandwich pattern generalizes across the three architectures, with a compositional core that absorbs additional depth while the translator regions retain near-fixed size; that the entry and exit translators operate in directionally opposite ways between encoders and the decoder; and that the dominant principal direction of GPT-2's final layer, capturing roughly 35% of total variance, is orthogonal to part-of-speech, lexical, positional, and sentiment information. We close with observations on how these layer-wise differences relate to active questions about cross-model representation sharing.

Powered by OpenAIRE graph
Found an issue? Give us feedback