Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

Name: Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models
Keywords: FOS: Computer and information sciences, I.2.7, 68T50, Computation and Language, Computation and Language (cs.CL)

Seungcheol Park; Jeongin Bae; Beomseok Kwon; Minjun Kim 0010; Byeongwook Kim; Se Jung Kwon; U Kang; Dongsoo Lee

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://doi.org/10.18653/v1/20...

Article . 2025 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2025

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Article

Data sources: DBLP

DBLP

Conference object

Data sources: DBLP

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2025Embargo end date: 01 Jan 2025Publisher:Association for Computational Linguistics (ACL)Journal:Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Authors: Seungcheol Park; Jeongin Bae; Beomseok Kwon; Minjun Kim 0010; Byeongwook Kim; Se Jung Kwon; U Kang; +1 Authors

doi: 10.18653/v1/2025.acl-long.1382 , 10.48550/arxiv.2506.03781

arXiv: 2506.03781

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

- Summary
- Subjects
- Metrics

Abstract

How can we quantize large language models while preserving accuracy? Quantization is essential for deploying large language models (LLMs) efficiently. Binary-coding quantization (BCQ) and uniform quantization (UQ) are promising quantization schemes that have strong expressiveness and optimizability, respectively. However, neither scheme leverages both advantages. In this paper, we propose UniQuanF (Unified Quantization with Flexible Mapping), an accurate quantization method for LLMs. UniQuanF harnesses both strong expressiveness and optimizability by unifying the flexible mapping technique in UQ and non-uniform quantization levels of BCQ. We propose unified initialization, and local and periodic mapping techniques to optimize the parameters in UniQuanF precisely. After optimization, our unification theorem removes computational and memory overhead, allowing us to utilize the superior accuracy of UniQuanF without extra deployment costs induced by the unification. Experimental results demonstrate that UniQuanF outperforms existing UQ and BCQ methods, achieving up to 4.60% higher accuracy on GSM8K benchmark.

ACL 2025 Main Track

Keywords

FOS: Computer and information sciences, I.2.7, 68T50, Computation and Language, Computation and Language (cs.CL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green