Name: NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning
Keywords: QA75, Computer Science - Machine Learning, QA75 Electronic computers. Computer science, Memory efficient training, Edge computing, NIS, CNN training, Local learning, 3rd-NDAS

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 22 Apr 2024Publisher:ACMJournal:Proceedings of the Nineteenth European Conference on Computer Systems

Authors: Dhananjay Saikumar; Blesson Varghese;

doi: 10.1145/3627703.3650067

arXiv: 2402.14139

handle: 10023/29805

NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning

- Summary
- Subjects
- Metrics

Abstract

Efficient on-device Convolutional Neural Network (CNN) training in resource-constrained mobile and edge environments is an open challenge. Backpropagation is the standard approach adopted, but it is GPU memory intensive due to its strong inter-layer dependencies that demand intermediate activations across the entire CNN model to be retained in GPU memory. This necessitates smaller batch sizes to make training possible within the available GPU memory budget, but in turn, results in substantially high and impractical training time. We introduce NeuroFlux, a novel CNN training system tailored for memory-constrained scenarios. We develop two novel opportunities: firstly, adaptive auxiliary networks that employ a variable number of filters to reduce GPU memory usage, and secondly, block-specific adaptive batch sizes, which not only cater to the GPU memory constraints but also accelerate the training process. NeuroFlux segments a CNN into blocks based on GPU memory usage and further attaches an auxiliary network to each layer in these blocks. This disrupts the typical layer dependencies under a new training paradigm - $\textit{`adaptive local learning'}$. Moreover, NeuroFlux adeptly caches intermediate activations, eliminating redundant forward passes over previously trained blocks, further accelerating the training process. The results are twofold when compared to Backpropagation: on various hardware platforms, NeuroFlux demonstrates training speed-ups of 2.3$\times$ to 6.1$\times$ under stringent GPU memory budgets, and NeuroFlux generates streamlined models that have 10.9$\times$ to 29.4$\times$ fewer parameters.

Comment: Accepted to EuroSys 2024

Related Organizations

University of St Andrews
United Kingdom

Keywords

QA75, Computer Science - Machine Learning, QA75 Electronic computers. Computer science, Memory efficient training, Edge computing, NIS, CNN training, Local learning, 3rd-NDAS

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Related to Research communities

UArctic