publication . Preprint . Article . 2016

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

Hai Li;
Open Access English
  • Published: 03 Aug 2016
Abstract
Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The number of parameters needed in CNNs, however, are often large and undesirable. Consequently, various methods have been developed to prune a CNN once it is trained. Nevertheless, the resulting CNNs offer limited benefits. While pruning the fully connected layers reduces a CNN's size considerably, it does not improve inference speed noticeably as the compute heavy parts lie in convolutions. Pruning CNNs in a way that increase inference speed often imposes specific sparsity structures, thus li...
Subjects
free text keywords: Computer Science - Computer Vision and Pattern Recognition
22 references, page 1 of 2

[1] M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to End Learning for Self-Driving Cars,” arXiv preprint arXiv:1604.07316, 2016. [OpenAIRE]

[2] A. Buluç, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, “Parallel Sparse MatrixVector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks,” in ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2009. [OpenAIRE]

[3] A. Buluç and J. R. Gilbert, “On the Representation and Multiplication of Hypersparse Matrices,” in International Symposium on Parallel and Distributed Processing (IPDPS), 2008. [OpenAIRE]

[4] E. Cuthill and J. McKee, “Reducing the bandwidth of sparse symmetric matrices,” in Proceedings of the ACM national conference, 1969. [OpenAIRE]

[5] T. A. Davis and Y. Hu, “The University of Florida Sparse Matrix Collection,” ACM Transactions on Mathematical Software, vol. 15, no. 1, 2011, http://www:cise:ufl:edu/research/sparse/ matrices.

[6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.

[7] M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. D. Freitas, “Predicting Parameters in Deep Learning,” in Proceedings of Advances in Neural Information Processing Systems (NIPS), 2013.

[8] E. L. Denton, W. Zaremba, J. Bruna, Y. Lecun, and R. Fergus, “Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation,” in Proceedings of Advances in Neural Information Processing Systems (NIPS), 2014.

[9] M. Figurnov, D. P. Vetrov, and P. Kohli, “PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions,” CoRR, vol. abs/1504.08362, 2015.

[10] S. Hadjis, F. Abuzaid, C. Zhang, and C. Re, “Caffe con Troll: Shallow Ideas to Speed Up Deep Learning,” arXiv preprint arXiv:1504.04343, 2015.

[11] S. Han, H. Mao, and W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” in International Conference on Learning Representations (ICLR), 2016.

[12] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both Weights and Connections for Efficient Neural Networks,” in Proceedings of Advances in Neural Information Processing Systems (NIPS), 2015.

[13] Intel Corporation, “Caffe Scoring Optimization for Intel Xeon Processor E5 Series,” https://software.intel.com/en-us/articles/caffe-scoring-optimization-intel-xeon-processore5-series.

[14] M. Jaderberg, A. Vedaldi, and A. Zisserman, “Speeding up Convolutional Neural Networks with Low Rank Expansions,” in British Machine Vision Conference (BMVC), 2014. [OpenAIRE]

[15] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional Architecture for Fast Feature Embedding,” in Proceedings of the ACM International Conference on Multimedia, 2014.

22 references, page 1 of 2
Abstract
Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The number of parameters needed in CNNs, however, are often large and undesirable. Consequently, various methods have been developed to prune a CNN once it is trained. Nevertheless, the resulting CNNs offer limited benefits. While pruning the fully connected layers reduces a CNN's size considerably, it does not improve inference speed noticeably as the compute heavy parts lie in convolutions. Pruning CNNs in a way that increase inference speed often imposes specific sparsity structures, thus li...
Subjects
free text keywords: Computer Science - Computer Vision and Pattern Recognition
22 references, page 1 of 2

[1] M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to End Learning for Self-Driving Cars,” arXiv preprint arXiv:1604.07316, 2016. [OpenAIRE]

[2] A. Buluç, J. T. Fineman, M. Frigo, J. R. Gilbert, and C. E. Leiserson, “Parallel Sparse MatrixVector and Matrix-Transpose-Vector Multiplication Using Compressed Sparse Blocks,” in ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2009. [OpenAIRE]

[3] A. Buluç and J. R. Gilbert, “On the Representation and Multiplication of Hypersparse Matrices,” in International Symposium on Parallel and Distributed Processing (IPDPS), 2008. [OpenAIRE]

[4] E. Cuthill and J. McKee, “Reducing the bandwidth of sparse symmetric matrices,” in Proceedings of the ACM national conference, 1969. [OpenAIRE]

[5] T. A. Davis and Y. Hu, “The University of Florida Sparse Matrix Collection,” ACM Transactions on Mathematical Software, vol. 15, no. 1, 2011, http://www:cise:ufl:edu/research/sparse/ matrices.

[6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.

[7] M. Denil, B. Shakibi, L. Dinh, M. Ranzato, and N. D. Freitas, “Predicting Parameters in Deep Learning,” in Proceedings of Advances in Neural Information Processing Systems (NIPS), 2013.

[8] E. L. Denton, W. Zaremba, J. Bruna, Y. Lecun, and R. Fergus, “Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation,” in Proceedings of Advances in Neural Information Processing Systems (NIPS), 2014.

[9] M. Figurnov, D. P. Vetrov, and P. Kohli, “PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions,” CoRR, vol. abs/1504.08362, 2015.

[10] S. Hadjis, F. Abuzaid, C. Zhang, and C. Re, “Caffe con Troll: Shallow Ideas to Speed Up Deep Learning,” arXiv preprint arXiv:1504.04343, 2015.

[11] S. Han, H. Mao, and W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” in International Conference on Learning Representations (ICLR), 2016.

[12] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both Weights and Connections for Efficient Neural Networks,” in Proceedings of Advances in Neural Information Processing Systems (NIPS), 2015.

[13] Intel Corporation, “Caffe Scoring Optimization for Intel Xeon Processor E5 Series,” https://software.intel.com/en-us/articles/caffe-scoring-optimization-intel-xeon-processore5-series.

[14] M. Jaderberg, A. Vedaldi, and A. Zisserman, “Speeding up Convolutional Neural Networks with Low Rank Expansions,” in British Machine Vision Conference (BMVC), 2014. [OpenAIRE]

[15] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional Architecture for Fast Feature Embedding,” in Proceedings of the ACM International Conference on Multimedia, 2014.

22 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue
publication . Preprint . Article . 2016

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

Hai Li;