Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units

Authors: Galiza Cerdeira Gonzalez, Antonio; Świderski, Mateusz; Indurkhya, Bipin;

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units

Abstract

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing UnitsA spiking neural network generates coherent multi-turn conversation from pure next-token prediction, without attention, without RLHF, and without filtering — running on a $290 used GPU. We introduce the Synaptogenic Adaptive Processing Unit Language Model (SAPU-LM), a multi-timescale spiking reservoir architecture that replaces attention entirely with trained recurrent dynamics in leaky integrate-and-fire neurons. The chatbot "Nemo" emerges from freezing the learned spiking topology and retraining only 8.5% of parameters on conversational data, achieving 38.05 test perplexity on DailyDialog. The architecture spans a lineage from a frozen Echo State Network (~19,500 perplexity) to 84.15 perplexity (M-SAPU-LM) on a WikiText-103 10M-token subsample — an ~80× improvement from training reservoir weights via surrogate gradients. A Tiling Parallel SAPU (TPSAPU) shares a single 512×512 recurrent weight matrix across three timescales and recovers to 84.67 perplexity after L1 pruning, suggesting that membrane time constant τ alone creates functional differentiation. Ternary quantization compresses the learned recurrent core to ~45 KB at 93.6% sparsity. L1 pruning reveals timescale-dependent topology emergence: fast reservoirs maintain distributed connectivity while slow reservoirs self-organize into diagonal self-excitatory memory cells — a structure discovered by the network, not imposed by design. The trained ternary spiking core maps directly to analog resistor-capacitor-comparator circuits; a proof-of-concept hardware exporter has been developed. To our knowledge, this is the first demonstration of open-ended next-token prediction using a trained spiking reservoir with no attention mechanism. Code and checkpoints: https://gitlab.com/AntonioGCGonzalez/synaptogenic-adaptive-processing-unit-language-models This is a preliminary technical report. Several configurations are ongoing; results will be updated in subsequent revisions.

Powered by OpenAIRE graph
Found an issue? Give us feedback