publication . Preprint . 2017

Shake-Shake regularization

Gastaldi, Xavier;
Open Access English
  • Published: 21 May 2017
Abstract
The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://github.com/xgastaldi/shake-shake
Subjects
free text keywords: Computer Science - Learning, Computer Science - Computer Vision and Pattern Recognition
Download from
24 references, page 1 of 2

Masoud Abdi and Saeid Nahavandi. Multi-residual networks. arXiv preprint arXiv:1609.05672, 2016.

Guozhong An. The effects of adding noise during backpropagation training on a generalization performance. Neural Comput., 1996. [OpenAIRE]

Léon Bottou. Online algorithms and stochastic approximations. In Online Learning and Neural Networks. Cambridge University Press, 1998.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016a.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In ECCV, 2016b.

Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. arXiv preprint arXiv:1608.06993, 2016a.

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. In ECCV, 2016b.

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.

Guoliang Kang, Jun Li, and Dacheng Tao. Shakeout: A new regularized deep neural network training scheme. In AAAI Conference on Artificial Intelligence, 2016.

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representation (ICLR '17), 2017.

Alex Krizhevsky. Learning multiple layers of features from tiny images. Tech Report, 2009.

Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648, 2016.

Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, and John Hopcroft. Convergent learning: Do different neural networks learn the same representations? In International Conference on Learning Representation (ICLR '16), 2016.

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.

Ilya Loshchilov and Frank Hutter. Sgdr: stochastic gradient descent with restarts. arXiv preprint arXiv:1608.03983, 2016. [OpenAIRE]

24 references, page 1 of 2
Abstract
The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://github.com/xgastaldi/shake-shake
Subjects
free text keywords: Computer Science - Learning, Computer Science - Computer Vision and Pattern Recognition
Download from
24 references, page 1 of 2

Masoud Abdi and Saeid Nahavandi. Multi-residual networks. arXiv preprint arXiv:1609.05672, 2016.

Guozhong An. The effects of adding noise during backpropagation training on a generalization performance. Neural Comput., 1996. [OpenAIRE]

Léon Bottou. Online algorithms and stochastic approximations. In Online Learning and Neural Networks. Cambridge University Press, 1998.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016a.

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In ECCV, 2016b.

Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. arXiv preprint arXiv:1608.06993, 2016a.

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q. Weinberger. Deep networks with stochastic depth. In ECCV, 2016b.

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.

Guoliang Kang, Jun Li, and Dacheng Tao. Shakeout: A new regularized deep neural network training scheme. In AAAI Conference on Artificial Intelligence, 2016.

Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representation (ICLR '17), 2017.

Alex Krizhevsky. Learning multiple layers of features from tiny images. Tech Report, 2009.

Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648, 2016.

Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, and John Hopcroft. Convergent learning: Do different neural networks learn the same representations? In International Conference on Learning Representation (ICLR '16), 2016.

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.

Ilya Loshchilov and Frank Hutter. Sgdr: stochastic gradient descent with restarts. arXiv preprint arXiv:1608.03983, 2016. [OpenAIRE]

24 references, page 1 of 2
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue