One weird trick for parallelizing convolutional neural networks

Preprint English OPEN
Krizhevsky, Alex;
  • Subject: Computer Science - Distributed, Parallel, and Cluster Computing | Computer Science - Neural and Evolutionary Computing | Computer Science - Learning
    arxiv: Quantitative Biology::Neurons and Cognition | Computer Science::Neural and Evolutionary Computation

I present a new way to parallelize the training of convolutional neural networks across multiple GPUs. The method scales significantly better than all alternatives when applied to modern convolutional neural networks.
  • References (7)

    Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Ng Andrew. Deep learning with cots hpc systems. In Proceedings of The 30th International Conference on Machine Learning, pages 1337-1345, 2013.

    Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V Le, Mark Z Mao, Marc'Aurelio Ranzato, Andrew W Senior, Paul A Tucker, et al. Large scale distributed deep networks. In NIPS, pages 1232-1240, 2012.

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248-255. IEEE, 2009.

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, volume 1, page 4, 2012.

    Feng Niu, Benjamin Recht, Christopher Ré, and Stephen J Wright. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. Advances in Neural Information Processing Systems, 24:693-701, 2011.

    Thomas Paine, Hailin Jin, Jianchao Yang, Zhe Lin, and Thomas Huang. Gpu asynchronous stochastic gradient descent to speed up neural network training. arXiv preprint arXiv:1312.6186, 2013.

    Omry Yadan, Keith Adams, Yaniv Taigman, and Marc'Aurelio Ranzato. Multi-gpu training of convnets. arXiv preprint arXiv:1312.5853, 2013.

  • Metrics
Share - Bookmark