To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Dec 2018Embargo end date: 01 Jan 2018 United Kingdom Publisher:IEEEJournal:2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)Funded by:UKRI | Distributed Heterogeneous..., UKRI | SANDeRS: Smart, Adaptive ...

Authors: Qin, Q; Ren, J; Yu, J; Wang, H; Gao, L; Zheng, J; Feng, Y; +2 Authors

doi: 10.1109/bdcloud.2018.00110 , 10.48550/arxiv.1810.08899

arXiv: 1810.08899

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

- Summary
- Subjects
- Metrics

Abstract

The recent advances in deep neural networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-constrained computing devices. Model compression techniques can address the computation issue of deep inference on embedded devices. This technique is highly attractive, as it does not rely on specialized hardware, or computation-offloading that is often infeasible due to privacy concerns or high latency. However, it remains unclear how model compression techniques perform across a wide range of DNNs. To design efficient embedded deep learning solutions, we need to understand their behaviors. This work develops a quantitative approach to characterize model compression techniques on a representative embedded deep learning architecture, the NVIDIA Jetson Tx2. We perform extensive experiments by considering 11 influential neural network architectures from the image classification and the natural language processing domains. We experimentally show that how two mainstream compression techniques, data quantization and pruning, perform on these network architectures and the implications of compression techniques to the model storage size, inference time, energy consumption and performance metrics. We demonstrate that there are opportunities to achieve fast deep inference on embedded systems, but one must carefully choose the compression settings. Our results provide insights on when and how to apply model compression techniques and guidelines for designing efficient embedded deep learning systems.

8 pages, To appear in ISPA 2018

Country

United Kingdom

Related Organizations

Shanxi Normal University
China (People's Republic of)
Pekin University
China (People's Republic of)
Peking University
China (People's Republic of)
National University of Defense Technolog
China (People's Republic of)
Northwest University (China)
China (People's Republic of)

View all View all

Keywords

FOS: Computer and information sciences, parallelism, Computer Science - Machine Learning, Computer Science - Performance, 000, Deep learning, Machine Learning (stat.ML), 004, Machine Learning (cs.LG), Performance (cs.PF), Computer Science - Distributed, Parallel, and Cluster Computing, Statistics - Machine Learning, deep inference, embedded systems, Distributed, Parallel, and Cluster Computing (cs.DC), energy efficiency

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	17
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%