publication . Preprint . 2016

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

Moskewicz, Matthew W.; Jannesari, Ali; Keutzer, Kurt;
Open Access English
  • Published: 21 Nov 2016
Abstract
In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite increasing hardware flexibility and software programming toolchain maturity, high efficiency GPU programming remains difficult: it suffers from high complexity, low productivity, and low portability. GPU vendors such as NVIDIA have spent enormous effort to write special-purpose DNN libraries. However, on other hardware targets, especially mobile GPUs, such vendor libraries are not generally available. Thus, the development of portab...
Subjects
free text keywords: Computer Science - Neural and Evolutionary Computing, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Mathematical Software
Download from
27 references, page 1 of 2

[1] S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2013.

[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in NIPS, 2012.

[3] J. Schmidhuber, “Multi-column deep neural networks for image classification,” in Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), ser. CVPR '12. Washington, DC, USA: IEEE Computer Society, 2012, pp. 3642-3649. [Online]. Available: http://dl.acm.org/citation.cfm?id=2354409.2354694 [OpenAIRE]

[4] NVIDIA, “cublas,” https://developer.nvidia.com/cublas, 2016, [Online; accessed 27-May-2016].

[5] AMD et al., “a software library containing blas functions written in opencl,” https://github.com/clMathLibraries/clBLAS, 2016, [Online; accessed 31-May-2016].

[6] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, “cuDNN: efficient primitives for deep learning,” arXiv:1410.0759, 2014. [OpenAIRE]

[7] R. Girshick, F. Iandola, T. Darrell, and J. Malik, “Deformable part models are convolutional neural networks,” in Computer Vision and Pattern Recognition, 2015. [OpenAIRE]

[8] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in Computer Vision and Pattern Recognition, 2014. [OpenAIRE]

[9] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv:1408.5093, 2014.

[10] P. C. University, “Lua: embeddable scripting language,” http://www. lua.org/, 2016, [Online; accessed 01-Oct-2016].

[11] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European Conference on Computer Vision. Springer, 2014, pp. 818-833.

[12] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.

[13] B. Catanzaro, S. Kamil, Y. Lee, K. Asanovic, J. Demmel, K. Keutzer, J. Shalf, K. Yelick, and A. Fox, “Sejits: Getting productivity and performance with selective embedded jit specialization.”

[14] L. Truong, R. Barik, E. Totoni, H. Liu, C. Markley, A. Fox, and T. Shpeisman, “Latte: a language, compiler, and runtime for elegant and efficient deep neural networks,” in Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2016, pp. 209-223.

[15] M. Moskewicz, F. Iandola, and K. Keutzer, “Boda-rtc: Productive generation of portable, efficient code for convolutional neural networks on mobile computing platforms,” arXiv preprint arXiv:1606.00094, 2016. [OpenAIRE]

27 references, page 1 of 2
Abstract
In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite increasing hardware flexibility and software programming toolchain maturity, high efficiency GPU programming remains difficult: it suffers from high complexity, low productivity, and low portability. GPU vendors such as NVIDIA have spent enormous effort to write special-purpose DNN libraries. However, on other hardware targets, especially mobile GPUs, such vendor libraries are not generally available. Thus, the development of portab...
Subjects
free text keywords: Computer Science - Neural and Evolutionary Computing, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Mathematical Software
Download from
27 references, page 1 of 2

[1] S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2013.

[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in NIPS, 2012.

[3] J. Schmidhuber, “Multi-column deep neural networks for image classification,” in Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), ser. CVPR '12. Washington, DC, USA: IEEE Computer Society, 2012, pp. 3642-3649. [Online]. Available: http://dl.acm.org/citation.cfm?id=2354409.2354694 [OpenAIRE]

[4] NVIDIA, “cublas,” https://developer.nvidia.com/cublas, 2016, [Online; accessed 27-May-2016].

[5] AMD et al., “a software library containing blas functions written in opencl,” https://github.com/clMathLibraries/clBLAS, 2016, [Online; accessed 31-May-2016].

[6] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, “cuDNN: efficient primitives for deep learning,” arXiv:1410.0759, 2014. [OpenAIRE]

[7] R. Girshick, F. Iandola, T. Darrell, and J. Malik, “Deformable part models are convolutional neural networks,” in Computer Vision and Pattern Recognition, 2015. [OpenAIRE]

[8] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in Computer Vision and Pattern Recognition, 2014. [OpenAIRE]

[9] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv:1408.5093, 2014.

[10] P. C. University, “Lua: embeddable scripting language,” http://www. lua.org/, 2016, [Online; accessed 01-Oct-2016].

[11] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European Conference on Computer Vision. Springer, 2014, pp. 818-833.

[12] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.

[13] B. Catanzaro, S. Kamil, Y. Lee, K. Asanovic, J. Demmel, K. Keutzer, J. Shalf, K. Yelick, and A. Fox, “Sejits: Getting productivity and performance with selective embedded jit specialization.”

[14] L. Truong, R. Barik, E. Totoni, H. Liu, C. Markley, A. Fox, and T. Shpeisman, “Latte: a language, compiler, and runtime for elegant and efficient deep neural networks,” in Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2016, pp. 209-223.

[15] M. Moskewicz, F. Iandola, and K. Keutzer, “Boda-rtc: Productive generation of portable, efficient code for convolutional neural networks on mobile computing platforms,” arXiv preprint arXiv:1606.00094, 2016. [OpenAIRE]

27 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue