publication . Preprint . Conference object . Other literature type . 2017

Chipmunk: A systolically scalable 0.9 mm 2 , 3.08Gop/s/mW @ 1.2 mW accelerator for near-sensor recurrent neural network inference

Conti, Francesco; Cavigelli, Lukas; Paulin, Gianna; Susmelj, Igor; Benini, Luca;
Open Access English
  • Published: 15 Nov 2017
Abstract
Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a small (<1 mm${}^2$) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capable to operate at a measured peak efficiency up to 3.08 Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring in huge memory transfer overhead, multiple Chipmunk engines can cooperate to form a single systolic array. In this way, the Chipmunk arch...
Subjects
free text keywords: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Sound, Hardware and Architecture, Electronic, Optical and Magnetic Materials, Electrical and Electronic Engineering, Computer hardware, business.industry, business, Wearable technology, Electronic engineering, Computer science, Hardware acceleration, Recurrent neural network, Systolic array, Scalability, Computation, 9 mm caliber, Chipmunk, biology.animal, biology
Funded by
EC| ExaNoDe
Project
ExaNoDe
European Exascale Processor Memory Node Design
  • Funder: European Commission (EC)
  • Project Code: 671578
  • Funding stream: H2020 | RIA
Communities
FET H2020FET HPC: HPC Core Technologies, Programming Environments and Algorithms for Extreme Parallelism and Extreme Data Applications
FET H2020FET HPC: European Exascale Processor Memory Node Design
17 references, page 1 of 2

A. Graves, A.-R. Mohamed, and G. Hinton, “Speech Recognition With Deep Recurrent Neural Networks,” in Proc. IEEE ICASSP, 2013.

W. Xiong, J. Droppo et al., “The Microsoft 2016 Conversational Speech Recognition System,” in Proc. IEEE ICASSP, 2017, pp. 5255-5259. [OpenAIRE]

K. Cho, B. van Merrienboer et al., “Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation,” in Proc. ACL EMNLP, 2014, pp. 1724-1734.

L. Cavigelli and L. Benini, “A 803 GOp/s/W Convolutional Network Accelerator,” IEEE TCSVT, 2016.

F. Conti, R. Schilling et al., “An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics,” IEEE TCAS, vol. 64, no. 9, pp. 2481-2494, 9 2017.

R. Andri, L. Cavigelli et al., “YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration,” IEEE TCAD, 2017.

IEEE ISSCC, 2016, pp. 262-263.

Z. Du, R. Fasthuber et al., “ShiDianNao: Shifting Vision Processing Closer to the Sensor,” in Proc. ACM/IEEE ISCA, 2015, pp. 92-104. [OpenAIRE]

V. Sze, Y.-H. Chen et al., “Efficient Processing of Deep Neural Networks: A Tutorial and Survey,” arXiv:703.09039, 2017.

N. P. Jouppi, A. Borchers et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit,” in Proc. ACM ISCA, 2017.

S. Han, J. Kang et al., “ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA,” in Proc. ACM/SIGDA FPGA, 2016.

V. Rybalkin, N. Wehn et al., “Hardware Architecture of Bidirectional Long Short-Term Memory Neural Network for Optical Character Recognition,” in Proc. IEEE DATE, 2017, pp. 1390-1395. [OpenAIRE]

A. X. M. Chang and E. Culurciello, “Hardware Accelerators for Recurrent Neural Networks on FPGA,” in Proc. IEEE ISCAS, 2017.

D. Shin, J. Lee et al., “DNPU: An 8.1TOPS/W Reconfigurable CNNRNN Processor for General-Purpose Deep Neural Networks,” in Proc.

IEEE ISSCC, vol. 60, 2017, pp. 240-241.

17 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue
publication . Preprint . Conference object . Other literature type . 2017

Chipmunk: A systolically scalable 0.9 mm 2 , 3.08Gop/s/mW @ 1.2 mW accelerator for near-sensor recurrent neural network inference

Conti, Francesco; Cavigelli, Lukas; Paulin, Gianna; Susmelj, Igor; Benini, Luca;