Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units

Galiza Cerdeira Gonzalez, Antonio; Świderski, Mateusz; Indurkhya, Bipin

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Preprint

Data sources: ZENODO

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units

descriptionPublicationkeyboard_double_arrow_right Preprint Under curation English Publisher:Zenodo

Authors: Galiza Cerdeira Gonzalez, Antonio; Świderski, Mateusz; Indurkhya, Bipin;

doi: 10.5281/zenodo.18778858

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing Units

- Summary

Abstract

Who Needs Attention? Spiking Language Modeling via Synaptogenic Adaptive Processing UnitsA spiking neural network generates coherent multi-turn conversation from pure next-token prediction, without attention, without RLHF, and without filtering — running on a $290 used GPU. We introduce the Synaptogenic Adaptive Processing Unit Language Model (SAPU-LM), a multi-timescale spiking reservoir architecture that replaces attention entirely with trained recurrent dynamics in leaky integrate-and-fire neurons. The chatbot "Nemo" emerges from freezing the learned spiking topology and retraining only 8.5% of parameters on conversational data, achieving 38.05 test perplexity on DailyDialog. The architecture spans a lineage from a frozen Echo State Network (~19,500 perplexity) to 84.15 perplexity (M-SAPU-LM) on a WikiText-103 10M-token subsample — an ~80× improvement from training reservoir weights via surrogate gradients. A Tiling Parallel SAPU (TPSAPU) shares a single 512×512 recurrent weight matrix across three timescales and recovers to 84.67 perplexity after L1 pruning, suggesting that membrane time constant τ alone creates functional differentiation. Ternary quantization compresses the learned recurrent core to ~45 KB at 93.6% sparsity. L1 pruning reveals timescale-dependent topology emergence: fast reservoirs maintain distributed connectivity while slow reservoirs self-organize into diagonal self-excitatory memory cells — a structure discovered by the network, not imposed by design. The trained ternary spiking core maps directly to analog resistor-capacitor-comparator circuits; a proof-of-concept hardware exporter has been developed. To our knowledge, this is the first demonstration of open-ended next-token prediction using a trained spiking reservoir with no attention mechanism. Code and checkpoints: https://gitlab.com/AntonioGCGonzalez/synaptogenic-adaptive-processing-unit-language-models This is a preliminary technical report. Several configurations are ongoing; results will be updated in subsequent revisions.

Found an issue? Give us feedback