Balanced Ternary Transformers: Eliminating Multiplication and Enabling Epistemic Uncertainty

I present a quantisation method for transformer-based language models that constrains weights to balanced ternary values {-1, 0, +1}, eliminating floating-point matrix multiplication entirely. Derived from Brusentsov's balanced ternary research at Moscow State University (1958-1965), this approach replaces multiply-accumulate operations with addition, subtraction, and skip operations. Key results: 93.8% reduction in energy consumption per inference 16x memory compression (28GB → 1.75GB for 7B parameters) 48x theoretical throughput improvement 87-92% signal preservation Architectural epistemic uncertainty enabling 50% abstention on uncertain inputs (hallucination prevention) The method requires no specialised hardware. Standard CPUs can execute efficiently. Full implementation open-sourced at: https://github.com/Zaneham/Ternary_inference

Related Organizations

Auckland University of Technology
New Zealand

Keywords

LLM, Hallucination prevention, Machine learning, Ternary quantisation, Epistemic Uncertainty, transformers, Neural Networks, Computer

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green