
I present a quantisation method for transformer-based language models that constrains weights to balanced ternary values {-1, 0, +1}, eliminating floating-point matrix multiplication entirely. Derived from Brusentsov's balanced ternary research at Moscow State University (1958-1965), this approach replaces multiply-accumulate operations with addition, subtraction, and skip operations. Key results: 93.8% reduction in energy consumption per inference 16x memory compression (28GB → 1.75GB for 7B parameters) 48x theoretical throughput improvement 87-92% signal preservation Architectural epistemic uncertainty enabling 50% abstention on uncertain inputs (hallucination prevention) The method requires no specialised hardware. Standard CPUs can execute efficiently. Full implementation open-sourced at: https://github.com/Zaneham/Ternary_inference
LLM, Hallucination prevention, Machine learning, Ternary quantisation, Epistemic Uncertainty, transformers, Neural Networks, Computer
LLM, Hallucination prevention, Machine learning, Ternary quantisation, Epistemic Uncertainty, transformers, Neural Networks, Computer
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
