Sound event detection with binary neural networks on tightly power-constrained IoT devices

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object , Other literature type 10 Aug 2020Embargo end date: 01 Jan 2021 Italy Publisher:ACMJournal:Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

Authors: Cerutti G.; Andri R.; Cavigelli L.; Farella E.; Magno M.; Benini L.;

doi: 10.1145/3370748.3406588 , 10.48550/arxiv.2101.04446

arXiv: 2101.04446

handle: 11585/800168

Sound event detection with binary neural networks on tightly power-constrained IoT devices

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

Sound event detection (SED) is a hot topic in consumer and smart city applications. Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput when targeting ultra-low power always-on devices. Latency, availability, cost, and privacy requirements are pushing recent IoT systems to process the data on the node, close to the sensor, with a very limited energy supply, and tight constraints on the memory size and processing capabilities precluding to run state-of-the-art DNNs. In this paper, we explore the combination of extreme quantization to a small-footprint binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller. Starting from an existing CNN for SED whose footprint (815 kB) exceeds the 512 kB of memory available on our platform, we retrain the network using binary filters and activations to match these memory constraints. (Fully) binary neural networks come with a natural drop in accuracy of 12-18% on the challenging ImageNet object recognition challenge compared to their equivalent full-precision baselines. This BNN reaches a 77.9% accuracy, just 7% lower than the full-precision version, with 58 kB (7.2 times less) for the weights and 262 kB (2.4 times less) memory in total. With our BNN implementation, we reach a peak throughput of 4.6 GMAC/s and 1.5 GMAC/s over the full network, including preprocessing with Mel bins, which corresponds to an efficiency of 67.1 GMAC/s/W and 31.3 GMAC/s/W, respectively. Compared to the performance of an ARM Cortex-M4 implementation, our system has a 10.3 times faster execution time and a 51.1 times higher energy-efficiency.

6 pages conference

Country

Italy

Related Organizations

ETH-Zurich
Switzerland
Fondazione Bruno Kessler
Italy
Alma Mater Studiorum University of Bologna
Italy
ETH Zurich
Switzerland

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, binary neural networks; sound event detection; ultra low power, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG)

2 Research products, page 1 of 1

A Heterogeneous Multicore System on Chip for Energy Efficient Brain Inspired Computing
2018IsAmongTopNSimilarDocuments
A heterogeneous multi-core system-on-chip for energy efficient brain inspired vision
2016IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	34
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%