publication . Preprint . 2015

Spatial Transformer Networks

Jaderberg, Max; Simonyan, Karen; Zisserman, Andrew; Kavukcuoglu, Koray;
Open Access English
  • Published: 05 Jun 2015
Abstract
Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. We sho...
Subjects
free text keywords: Computer Science - Computer Vision and Pattern Recognition
Download from
40 references, page 1 of 3

[1] J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. ICLR, 2015.

[2] F. Bookstein. Principal warps : Thin-plate splines and the decomposition of deformations. IEEE PAMI, 1989. [OpenAIRE]

[3] S. Branson, G. Van Horn, S. Belongie, and P. Perona. Bird species categorization using pose normalized deep convolutional nets. BMVC., 2014.

[4] J. Bruna and S. Mallat. Invariant scattering convolution networks. IEEE PAMI, 35(8):1872-1886, 2013.

[5] M. Cimpoi, S. Maji, and A. Vedaldi. Deep filter banks for texture recognition and segmentation. In CVPR, 2015. [OpenAIRE]

[6] T. S. Cohen and M. Welling. Transformation properties of learned visual representations. ICLR, 2015.

[7] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. In CVPR, 2014.

[8] J. D. Foley, A. Van Dam, S. K. Feiner, J. F. Hughes, and R. L. Phillips. Introduction to computer graphics, volume 55. Addison-Wesley Reading, 1994.

[9] B. J. Frey and N. Jojic. Fast, large-scale transformation-invariant clustering. In NIPS, 2001.

[10] R. Gens and P. M. Domingos. Deep symmetry networks. In NIPS, 2014.

[11] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.

[12] G. Gkioxari, R. Girshick, and J. Malik. Contextual action recognition with r* cnn. arXiv:1505.01197, 2015.

[13] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv:1312.6082, 2013. [OpenAIRE]

[14] K. Gregor, I. Danihelka, A. Graves, and D. Wierstra. Draw: A recurrent neural network for image generation. ICML, 2015. [OpenAIRE]

[15] G. E. Hinton. A parallel computation that assigns canonical object-based frames of reference. In IJCAI, 1981.

40 references, page 1 of 3
Abstract
Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. We sho...
Subjects
free text keywords: Computer Science - Computer Vision and Pattern Recognition
Download from
40 references, page 1 of 3

[1] J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. ICLR, 2015.

[2] F. Bookstein. Principal warps : Thin-plate splines and the decomposition of deformations. IEEE PAMI, 1989. [OpenAIRE]

[3] S. Branson, G. Van Horn, S. Belongie, and P. Perona. Bird species categorization using pose normalized deep convolutional nets. BMVC., 2014.

[4] J. Bruna and S. Mallat. Invariant scattering convolution networks. IEEE PAMI, 35(8):1872-1886, 2013.

[5] M. Cimpoi, S. Maji, and A. Vedaldi. Deep filter banks for texture recognition and segmentation. In CVPR, 2015. [OpenAIRE]

[6] T. S. Cohen and M. Welling. Transformation properties of learned visual representations. ICLR, 2015.

[7] D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. In CVPR, 2014.

[8] J. D. Foley, A. Van Dam, S. K. Feiner, J. F. Hughes, and R. L. Phillips. Introduction to computer graphics, volume 55. Addison-Wesley Reading, 1994.

[9] B. J. Frey and N. Jojic. Fast, large-scale transformation-invariant clustering. In NIPS, 2001.

[10] R. Gens and P. M. Domingos. Deep symmetry networks. In NIPS, 2014.

[11] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.

[12] G. Gkioxari, R. Girshick, and J. Malik. Contextual action recognition with r* cnn. arXiv:1505.01197, 2015.

[13] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv:1312.6082, 2013. [OpenAIRE]

[14] K. Gregor, I. Danihelka, A. Graves, and D. Wierstra. Draw: A recurrent neural network for image generation. ICML, 2015. [OpenAIRE]

[15] G. E. Hinton. A parallel computation that assigns canonical object-based frames of reference. In IJCAI, 1981.

40 references, page 1 of 3
Any information missing or wrong?Report an Issue