publication . Other literature type . Conference object . Preprint . 2014

Speeding up Convolutional Neural Networks with Low Rank Expansions

Max Jaderberg; Andrea Vedaldi; Andrew Zisserman;
Open Access
  • Published: 15 May 2014
  • Publisher: British Machine Vision Association and Society for Pattern Recognition
Abstract
The focus of this paper is speeding up the application of convolutional neural networks. While delivering impressive results across a range of computer vision and machine learning tasks, these networks are computationally demanding, limiting their deployability. Convolutional layers generally consume the bulk of the processing time, and so in this work we present two simple schemes for drastically speeding up these layers. This is achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain. Our methods are architecture agnostic, and can be easily applied to existing CPU and GPU convolut...
Subjects
free text keywords: Computer Science - Computer Vision and Pattern Recognition, Computer science, Limiting, Theoretical computer science, Architecture, Character recognition, Convolutional neural network, Spatial domain, Computer engineering, Redundancy (engineering), Speedup
Funded by
EC| VISREC
Project
VISREC
Visual Recognition
  • Funder: European Commission (EC)
  • Project Code: 228180
  • Funding stream: FP7 | SP2 | ERC
42 references, page 1 of 3

[1] http://algoval.essex.ac.uk/icdar/datasets.html.

[2] http://www.iapr-tc11.org/mediawiki/index.php/kaist_scene_text_database.

[3] O. Alsharif and J. Pineau. End-to-End Text Recognition with Hybrid HMM Maxout Models. In International Conference on Learning Representations, 2014.

[4] A. Bissacco, M. Cummins, Y. Netzer, and H. Neven. PhotoOCR: Reading text in uncontrolled conditions. In International Conference of Computer Vision, 2013. [OpenAIRE]

[5] T. de Campos, B. R. Babu, and M. Varma. Character recognition in natural images. 2009.

[6] M. Denil, B. Shakibi, L. Dinh, and N. de Freitas. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems, pages 2148-2156, 2013. [OpenAIRE]

[7] E. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. arXiv preprint arXiv:1404.0736, 2014.

[8] C. Farabet, Y. LeCun, K. Kavukcuoglu, E. Culurciello, B. Martini, P. Akselrod, and S. Talay. Large-scale fpga-based convolutional networks. Machine Learning on Very Large Data Sets, 2011.

[9] C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Scene parsing with multiscale feature learning, purity trees, and optimal covers. arXiv preprint arXiv:1202.2160, 2012. [OpenAIRE]

[10] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet. Multi-digit number recognition from street view imagery using deep convolutional neural networks. In International Conference on Learning Representations, 2013. [OpenAIRE]

[11] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. arXiv preprint arXiv:1302.4389, 2013.

[12] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.

[13] F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869, 2014. [OpenAIRE]

[14] Y. Jia. Caffe: An open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/, 2013.

[15] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, S. R. Mestre, J. Mas, D. F. Mota, J. Almazan, L. P. de las Heras, et al. ICDAR 2013 robust reading competition. In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on, pages 1484-1493. IEEE, 2013.

42 references, page 1 of 3
Abstract
The focus of this paper is speeding up the application of convolutional neural networks. While delivering impressive results across a range of computer vision and machine learning tasks, these networks are computationally demanding, limiting their deployability. Convolutional layers generally consume the bulk of the processing time, and so in this work we present two simple schemes for drastically speeding up these layers. This is achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain. Our methods are architecture agnostic, and can be easily applied to existing CPU and GPU convolut...
Subjects
free text keywords: Computer Science - Computer Vision and Pattern Recognition, Computer science, Limiting, Theoretical computer science, Architecture, Character recognition, Convolutional neural network, Spatial domain, Computer engineering, Redundancy (engineering), Speedup
Funded by
EC| VISREC
Project
VISREC
Visual Recognition
  • Funder: European Commission (EC)
  • Project Code: 228180
  • Funding stream: FP7 | SP2 | ERC
42 references, page 1 of 3

[1] http://algoval.essex.ac.uk/icdar/datasets.html.

[2] http://www.iapr-tc11.org/mediawiki/index.php/kaist_scene_text_database.

[3] O. Alsharif and J. Pineau. End-to-End Text Recognition with Hybrid HMM Maxout Models. In International Conference on Learning Representations, 2014.

[4] A. Bissacco, M. Cummins, Y. Netzer, and H. Neven. PhotoOCR: Reading text in uncontrolled conditions. In International Conference of Computer Vision, 2013. [OpenAIRE]

[5] T. de Campos, B. R. Babu, and M. Varma. Character recognition in natural images. 2009.

[6] M. Denil, B. Shakibi, L. Dinh, and N. de Freitas. Predicting parameters in deep learning. In Advances in Neural Information Processing Systems, pages 2148-2156, 2013. [OpenAIRE]

[7] E. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. arXiv preprint arXiv:1404.0736, 2014.

[8] C. Farabet, Y. LeCun, K. Kavukcuoglu, E. Culurciello, B. Martini, P. Akselrod, and S. Talay. Large-scale fpga-based convolutional networks. Machine Learning on Very Large Data Sets, 2011.

[9] C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Scene parsing with multiscale feature learning, purity trees, and optimal covers. arXiv preprint arXiv:1202.2160, 2012. [OpenAIRE]

[10] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet. Multi-digit number recognition from street view imagery using deep convolutional neural networks. In International Conference on Learning Representations, 2013. [OpenAIRE]

[11] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. arXiv preprint arXiv:1302.4389, 2013.

[12] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.

[13] F. Iandola, M. Moskewicz, S. Karayev, R. Girshick, T. Darrell, and K. Keutzer. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869, 2014. [OpenAIRE]

[14] Y. Jia. Caffe: An open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/, 2013.

[15] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, S. R. Mestre, J. Mas, D. F. Mota, J. Almazan, L. P. de las Heras, et al. ICDAR 2013 robust reading competition. In Document Analysis and Recognition (ICDAR), 2013 12th International Conference on, pages 1484-1493. IEEE, 2013.

42 references, page 1 of 3
Any information missing or wrong?Report an Issue