Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2019Embargo end date: 01 Jan 2019Publisher:arXivFunded by:FCT | D4

Authors: Yuhang Li 0001; Xin Dong 0009; Wei Wang 0059;

doi: 10.48550/arxiv.1909.13144

arXiv: 1909.13144

Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

We propose Additive Powers-of-Two~(APoT) quantization, an efficient non-uniform quantization scheme for the bell-shaped and long-tailed distribution of weights and activations in neural networks. By constraining all quantization levels as the sum of Powers-of-Two terms, APoT quantization enjoys high computational efficiency and a good match with the distribution of weights. A simple reparameterization of the clipping function is applied to generate a better-defined gradient for learning the clipping threshold. Moreover, weight normalization is presented to refine the distribution of weights to make the training more stable and consistent. Experimental results show that our proposed method outperforms state-of-the-art methods, and is even competitive with the full-precision models, demonstrating the effectiveness of our proposed APoT quantization. For example, our 4-bit quantized ResNet-50 on ImageNet achieves 76.6% top-1 accuracy without bells and whistles; meanwhile, our model reduces 22% computational cost compared with the uniformly quantized counterpart. The code is available at https://github.com/yhhhli/APoT_Quantization.

quantization, efficient neural network

Related Organizations

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Machine Learning (cs.LG)

1 Research products, page 1 of 1

vision software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average