publication . Preprint . 2016

DeepSpark: A Spark-Based Distributed Deep Learning Framework for Commodity Clusters

Kim, Hanjoo; Park, Jaehong; Jang, Jaehee; Yoon, Sungroh;
Open Access English
  • Published: 25 Feb 2016
Abstract
The increasing complexity of deep neural networks (DNNs) has made it challenging to exploit existing large-scale data processing pipelines for handling massive data and parameters involved in DNN training. Distributed computing platforms and GPGPU-based acceleration provide a mainstream solution to this computational challenge. In this paper, we propose DeepSpark, a distributed and parallel deep learning framework that exploits Apache Spark on commodity clusters. To support parallel operations, DeepSpark automatically distributes workloads and parameters to Caffe/Tensorflow-running nodes using Spark, and iteratively aggregates training results by a novel lock-fr...
Subjects
free text keywords: Computer Science - Learning
Related Organizations
Download from
36 references, page 1 of 3

[1] K. He, et al. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.

[2] J. K. Chorowski, et al. Attention-based models for speech recognition. In NIPS, pages 577{585, 2015. [OpenAIRE]

[3] K. Simonyan et al. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[4] C. Szegedy, et al. Going deeper with convolutions. In CVPR, pages 1{9, 2015.

[5] A. Krizhevsky, et al. Imagenet classi cation with deep convolutional neural networks. In NIPS, pages 1097{ 1105, 2012.

[6] S. Chetlur, et al. cudnn: E cient primitives for deep learning. arXiv preprint arXiv:1410.0759, 2014. [OpenAIRE]

[7] A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014. [OpenAIRE]

[8] M. Li, et al. Scaling distributed machine learning with the parameter server. In OSDI, pages 583{598, 2014.

[9] J. Dean, et al. Large scale distributed deep networks. In NIPS, pages 1223{1231, 2012.

[10] Q. Ho, et al. More e ective distributed ml via a stale synchronous parallel parameter server. In NIPS, pages 1223{1231, 2013.

[11] E. P. Xing, et al. Petuum: A new platform for distributed machine learning on big data. In SIGKDD, KDD '15, pages 1335{1344, New York, NY, USA, 2015. ACM.

[12] B. C. Ooi, et al. Singa: A distributed deep learning platform. In Proceedings of the ACM International Conference on Multimedia, pages 685{688. ACM, 2015.

[13] J. T. Geiger, et al. Investigating nmf speech enhancement for neural network based acoustic models. In INTERSPEECH, pages 2405{2409, 2014.

[14] P. Moritz, et al. Sparknet: Training deep networks in spark. arXiv preprint arXiv:1511.06051, 2015.

[15] M. Zaharia, et al. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, volume 10, page 10, 2010.

36 references, page 1 of 3
Abstract
The increasing complexity of deep neural networks (DNNs) has made it challenging to exploit existing large-scale data processing pipelines for handling massive data and parameters involved in DNN training. Distributed computing platforms and GPGPU-based acceleration provide a mainstream solution to this computational challenge. In this paper, we propose DeepSpark, a distributed and parallel deep learning framework that exploits Apache Spark on commodity clusters. To support parallel operations, DeepSpark automatically distributes workloads and parameters to Caffe/Tensorflow-running nodes using Spark, and iteratively aggregates training results by a novel lock-fr...
Subjects
free text keywords: Computer Science - Learning
Related Organizations
Download from
36 references, page 1 of 3

[1] K. He, et al. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.

[2] J. K. Chorowski, et al. Attention-based models for speech recognition. In NIPS, pages 577{585, 2015. [OpenAIRE]

[3] K. Simonyan et al. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[4] C. Szegedy, et al. Going deeper with convolutions. In CVPR, pages 1{9, 2015.

[5] A. Krizhevsky, et al. Imagenet classi cation with deep convolutional neural networks. In NIPS, pages 1097{ 1105, 2012.

[6] S. Chetlur, et al. cudnn: E cient primitives for deep learning. arXiv preprint arXiv:1410.0759, 2014. [OpenAIRE]

[7] A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014. [OpenAIRE]

[8] M. Li, et al. Scaling distributed machine learning with the parameter server. In OSDI, pages 583{598, 2014.

[9] J. Dean, et al. Large scale distributed deep networks. In NIPS, pages 1223{1231, 2012.

[10] Q. Ho, et al. More e ective distributed ml via a stale synchronous parallel parameter server. In NIPS, pages 1223{1231, 2013.

[11] E. P. Xing, et al. Petuum: A new platform for distributed machine learning on big data. In SIGKDD, KDD '15, pages 1335{1344, New York, NY, USA, 2015. ACM.

[12] B. C. Ooi, et al. Singa: A distributed deep learning platform. In Proceedings of the ACM International Conference on Multimedia, pages 685{688. ACM, 2015.

[13] J. T. Geiger, et al. Investigating nmf speech enhancement for neural network based acoustic models. In INTERSPEECH, pages 2405{2409, 2014.

[14] P. Moritz, et al. Sparknet: Training deep networks in spark. arXiv preprint arXiv:1511.06051, 2015.

[15] M. Zaharia, et al. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, volume 10, page 10, 2010.

36 references, page 1 of 3
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue