BitMamba-2: Efficient Scaling of 1.58-bit State Space Models

Name: BitMamba-2: Efficient Scaling of 1.58-bit State Space Models
Creator: Salazar, Jesus
Keywords: Neural Scaling Laws, Large Language Models, Quantization, Mamba, Efficient Inference, Green AI, Ternary Weights, 1.58-bit, BitNet, State Space Models

Salazar, Jesus

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Other literature type . 2026

License: CC BY

Data sources: ZENODO

BitMamba-2: Efficient Scaling of 1.58-bit State Space Models

descriptionPublicationkeyboard_double_arrow_right Other literature type 27 Jan 2026 English Publisher:Zenodo

Authors: Salazar, Jesus;

doi: 10.5281/zenodo.18394665

BitMamba-2: Efficient Scaling of 1.58-bit State Space Models

- Summary
- Subjects

Abstract

The scaling of Large Language Models (LLMs) is traditionally constrained by the quadraticcomplexity of Transformers and the memory bandwidth bottleneck associated with high-precisionweights. While State Space Models (SSMs) like Mamba have addressed the sequence scalinglimitation with linear-time complexity, the memory footprint remains a challenge for edge de-ployment. In this work, we introduce BitMamba-2, a hybrid architecture that integratesthe 1.58-bit ternary quantization of BitNet into the Mamba-2 framework. We train two mod-els from scratch: a 255M parameter baseline and a scaled-up 1B parameter model, utiliz-ing a high-quality dataset mix comprising FineWeb-Edu, Cosmopedia, and The Stack-Dedup.Our experiments, conducted on Google Cloud TPU v6e hardware, demonstrate strong scal-ing laws for ternary SSMs. The 1B model achieves a 7.8% improvement in ARC-Easy ac-curacy (63.3%) and a dramatic reduction in perplexity (from 51.69 to 29.62) compared tothe 255M baseline. Furthermore, we demonstrate that BitMamba-2 enables high-performanceinference on consumer CPUs, achieving ∼53 tokens/second on an Intel i3 processor with amemory footprint of just 621 MB. Code and pre-trained checkpoints are publicly available athttps://github.com/Zhayr1/BitMamba-2, https://huggingface.co/Zhayr1/BitMamba-2-1Band https://huggingface.co/Zhayr1/BitMamba-2-0.25B.

Keywords

Neural Scaling Laws, Large Language Models, Quantization, Mamba, Efficient Inference, Green AI, Ternary Weights, 1.58-bit, BitNet, State Space Models

Found an issue? Give us feedback