
We introduce Q-Jamba, a family of quaternion-native language model architectures that achieve 3.4× parameter compression through Hamilton weight sharing while matching or exceeding standard transformer quality. All linear projections are replaced with QuaternionLinear layers that construct full m×n weight matrices from mn/4 learned parameters via the Hamilton block structure. We extend this principle to selective state spaces, proposing Q-Mamba — to our knowledge, the first SSM with Hamilton recurrence — and Q-Jamba, a hybrid that interleaves Q-Mamba blocks with quaternion attention. On a 9-task reasoning benchmark (n=5 seeds each), Q-Linear (547K params) significantly outperforms a parameter-matched standard transformer (559K params) with Cohen's d≈5.0. Q-Jamba 4:2 (813K params) achieves the lowest validation loss of all arms (0.421±0.004 vs. 0.506±0.016, p<10⁻⁴). On WikiText-2, Q-Linear matches a 3.4× larger standard model (BPC 2.102 vs. 2.105). A controlled dual-axis ablation reveals that structured coupling — not the algebraic rules of the quaternion algebra — drives these gains for feed-forward weights, while Hamilton algebra remains essential for recurrent state transitions. These findings emerge from 45 experiments on a single consumer GPU.
LLM, hybrid architectures, language models, quaternion neural networks, Hamilton product, Mamba, parameter compression, state-space models, parameter efficiency, structured weight sharing
LLM, hybrid architectures, language models, quaternion neural networks, Hamilton product, Mamba, parameter compression, state-space models, parameter efficiency, structured weight sharing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
