<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Scalable and Practical Natural Gradient for Large-Scale Deep Learning

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2022Embargo end date: 01 Jan 2020 Japan Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 44, pages 404-415 (issn: 0162-8828, eissn: 1939-3539,

Authors: Kazuki Osawa; Yohei Tsuji; Yuichiro Ueno; Akira Naruse; Chuan-Sheng Foo; Rio Yokota;

doi: 10.1109/tpami.2020.3004354 , 10.48550/arxiv.2002.06015

pmid: 32750792

arXiv: http://arxiv.org/abs/2002.06015

Scalable and Practical Natural Gradient for Large-Scale Deep Learning

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size. Previous approaches attempt to address this problem by varying the learning rate and batch size over epochs and layers, or ad hoc modifications of batch normalization. We propose Scalable and Practical Natural Gradient Descent (SP-NGD), a principled approach for training models that allows them to attain similar generalization performance to models trained with first-order optimization methods, but with accelerated convergence. Furthermore, SP-NGD scales to large mini-batch sizes with a negligible computational overhead as compared to first-order methods. We evaluated SP-NGD on a benchmark task where highly optimized first-order methods are available as references: training a ResNet-50 model for image classification on ImageNet. We demonstrate convergence to a top-1 validation accuracy of 75.4% in 5.5 minutes using a mini-batch size of 32,768 with 1,024 GPUs, as well as an accuracy of 74.9% with an extremely large mini-batch size of 131,072 in 873 steps of SP-NGD.

arXiv admin note: text overlap with arXiv:1811.12019

Country

Japan

Related Organizations

Agency for Science, Technology and Research
Singapore
Institute for Infocomm Research
Singapore
Nvidia
United States
Institute of Science Tokyo
Japan

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Deep Learning, Statistics - Machine Learning, Machine Learning (stat.ML), Neural Networks, Computer, Algorithms, Machine Learning (cs.LG)

3 Research products, page of 1

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	19
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

Top 10%

Green

hybrid

Fields of Science (4) View all

natural sciences

Fields of Science

natural sciences

View all