publication . Article . Preprint . 2019

exploring weight symmetry in deep neural networks

Hu, Xu Shell; Zagoruyko, Sergey; Komodakis, Nikos;
Open Access
  • Published: 01 Aug 2019 Journal: Computer Vision and Image Understanding, volume 187, page 102,786 (issn: 1077-3142, Copyright policy)
  • Publisher: Elsevier BV
  • Country: France
Abstract
International audience; We propose to impose symmetry in neural network parameters to improve parameter usage and make use of dedicated convolution and matrix multiplication routines. Due to significant reduction in the number of parameters as a result of the symmetry constraints, one would expect a dramatic drop in accuracy. Surprisingly, we show that this is not the case, and, depending on network size, symmetry can have little or no negative effect on network accuracy, especially in deep overparameterized networks. We propose several ways to impose local symmetry in recurrent and convolutional neural networks, and show that our symmetry parameterizations sati...
Subjects
free text keywords: Signal Processing, Software, Computer Vision and Pattern Recognition, Machine learning, computer.software_genre, computer, Language model, Deep neural networks, Mathematics, Approximation property, Artificial neural network, Algorithm, Matrix multiplication, Artificial intelligence, business.industry, business, Convolution, Local symmetry, Convolutional neural network, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Computer Science - Machine Learning, Statistics - Machine Learning
43 references, page 1 of 3

Boulch, A. (2017). Sharesnet: reducing residual network parameter number by sharing weights. Proceedings of the International Conference on Learning Representations. [OpenAIRE]

Bruna, J. and Mallat, S. (2013). Invariant scattering convolution networks. IEEE transactions on pattern analysis and machine intelligence, 35(8):1872-1886.

Chen, W., Wilson, J., Tyree, S., Weinberger, K. Q., and Chen, Y. (2016). Compressing convolutional neural networks in the frequency domain. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, pages 1475-1484.

Cho, K., van Merrienboer, B., Bahdanau, D., and Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. In Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), 2014.

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303-314.

Denil, M., Shakibi, B., Dinh, L., Ranzato, M. A., and de Freitas, N. (2013). Predicting parameters in deep learning. In Advances in Neural Information Processing Systems 26, pages 2148-2156. Curran Associates, Inc. [OpenAIRE]

Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. pages 1269-1277.

Goto, K. and Van De Geijn, R. (2008). High-performance implementation of the level-3 blas. ACM Trans. Math. Softw., 35(1):4:1-4:14.

Goyal, P., Dollár, P., Girshick, R. B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. (2017). Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677.

Greff, K., Srivastava, R. K., and jürgen Schmidhuber (2017). Highway and residual networks learn unrolled iterative estimation. Proceedings of the International Conference on Learning Representations.

Ha, D., Dai, A., and Le, Q. V. (2017). Hypernetworks. Proceedings of the International Conference on Learning Representations.

Han, S., Mao, H., and Dally, W. J. (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR).

Han, S., Pool, J., Narang, S., Mao, H., Tang, S., Elsen, E., Catanzaro, B., Tran, J., and Dally, W. J. (2017). DSD: regularizing deep neural networks with dense-sparse-dense training flow. International Conference on Learning Representations.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory.

43 references, page 1 of 3
Abstract
International audience; We propose to impose symmetry in neural network parameters to improve parameter usage and make use of dedicated convolution and matrix multiplication routines. Due to significant reduction in the number of parameters as a result of the symmetry constraints, one would expect a dramatic drop in accuracy. Surprisingly, we show that this is not the case, and, depending on network size, symmetry can have little or no negative effect on network accuracy, especially in deep overparameterized networks. We propose several ways to impose local symmetry in recurrent and convolutional neural networks, and show that our symmetry parameterizations sati...
Subjects
free text keywords: Signal Processing, Software, Computer Vision and Pattern Recognition, Machine learning, computer.software_genre, computer, Language model, Deep neural networks, Mathematics, Approximation property, Artificial neural network, Algorithm, Matrix multiplication, Artificial intelligence, business.industry, business, Convolution, Local symmetry, Convolutional neural network, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], Computer Science - Machine Learning, Statistics - Machine Learning
43 references, page 1 of 3

Boulch, A. (2017). Sharesnet: reducing residual network parameter number by sharing weights. Proceedings of the International Conference on Learning Representations. [OpenAIRE]

Bruna, J. and Mallat, S. (2013). Invariant scattering convolution networks. IEEE transactions on pattern analysis and machine intelligence, 35(8):1872-1886.

Chen, W., Wilson, J., Tyree, S., Weinberger, K. Q., and Chen, Y. (2016). Compressing convolutional neural networks in the frequency domain. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, pages 1475-1484.

Cho, K., van Merrienboer, B., Bahdanau, D., and Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. In Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), 2014.

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303-314.

Denil, M., Shakibi, B., Dinh, L., Ranzato, M. A., and de Freitas, N. (2013). Predicting parameters in deep learning. In Advances in Neural Information Processing Systems 26, pages 2148-2156. Curran Associates, Inc. [OpenAIRE]

Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. pages 1269-1277.

Goto, K. and Van De Geijn, R. (2008). High-performance implementation of the level-3 blas. ACM Trans. Math. Softw., 35(1):4:1-4:14.

Goyal, P., Dollár, P., Girshick, R. B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. (2017). Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677.

Greff, K., Srivastava, R. K., and jürgen Schmidhuber (2017). Highway and residual networks learn unrolled iterative estimation. Proceedings of the International Conference on Learning Representations.

Ha, D., Dai, A., and Le, Q. V. (2017). Hypernetworks. Proceedings of the International Conference on Learning Representations.

Han, S., Mao, H., and Dally, W. J. (2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR).

Han, S., Pool, J., Narang, S., Mao, H., Tang, S., Elsen, E., Catanzaro, B., Tran, J., and Dally, W. J. (2017). DSD: regularizing deep neural networks with dense-sparse-dense training flow. International Conference on Learning Representations.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory.

43 references, page 1 of 3
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue