Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

Separate and Amplify: Attention's Geometry of Retrieval

Authors: Maselko, Theodore;

Separate and Amplify: Attention's Geometry of Retrieval

Abstract

Using the Tuple-Structured Associative Recall task to isolate retrieval, we demonstrate that Transformer models learn high-magnitude spherical codes (sets of vectors with a guaranteed minimum angular separation) and can achieve perfect accuracy and robust length generalization down to single-digit head dimensions. We show by construction that attention's single-head retrieval capacity $N$ approaches the representational limit of the subspaces it projects from, and is thus unbounded with real numbers. Given $b$ bits per coordinate of input, capacity scales as $N \approx 2^{bd_k}$, or equivalently $N \approx 2^B$ for some total number of bits $B$. Head dimension $d_k \geq 2$ does not increase capacity, but influences how efficiently a given spherical code can approach this representational limit.

Powered by OpenAIRE graph
Found an issue? Give us feedback