publication . Preprint . 2014

Recurrent Models of Visual Attention

Mnih, Volodymyr; Heess, Nicolas; Graves, Alex; Kavukcuoglu, Koray;
Open Access English
  • Published: 24 Jun 2014
Abstract
Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using re...
Subjects
arXiv: Computer Science::Computer Vision and Pattern Recognition
ACM Computing Classification System: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
free text keywords: Computer Science - Learning, Computer Science - Computer Vision and Pattern Recognition, Statistics - Machine Learning
Download from
26 references, page 1 of 2

[1] Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. What is an object? In CVPR, 2010. [OpenAIRE]

[2] Bogdan Alexe, Nicolas Heess, Yee Whye Teh, and Vittorio Ferrari. Searching for objects driven by context. In NIPS, 2012. [OpenAIRE]

[3] James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13:281-305, 2012.

[4] Nicholas J. Butko and Javier R. Movellan. Optimal scanning for faster object detection. In CVPR, 2009. [OpenAIRE]

[5] N.J. Butko and J.R. Movellan. I-pomdp: An infomax model of eye movement. In Proceedings of the 7th IEEE International Conference on Development and Learning, ICDL '08, pages 139 -144, 2008. [OpenAIRE]

[6] Misha Denil, Loris Bazzani, Hugo Larochelle, and Nando de Freitas. Learning where to attend with deep architectures for image tracking. Neural Computation, 24(8):2151-2184, 2012.

[7] Pedro F. Felzenszwalb, Ross B. Girshick, and David A. McAllester. Cascade object detection with deformable part models. In CVPR, 2010. [OpenAIRE]

[8] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2524, 2013. [OpenAIRE]

[9] Mary Hayhoe and Dana Ballard. Eye movements in natural behavior. Trends in Cognitive Sciences, 9(4):188 - 194, 2005.

[10] Sepp Hochreiter and Ju¨rgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735- 1780, 1997.

[11] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254-1259, 1998.

[12] Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106-1114, 2012.

[13] Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann. Beyond sliding windows: Object localization by efficient subwindow search. In CVPR, 2008. [OpenAIRE]

[14] Hugo Larochelle and Geoffrey E. Hinton. Learning to combine foveal glimpses with a third-order boltzmann machine. In NIPS, 2010.

[15] Stefan Mathe and Cristian Sminchisescu. Action from still image dataset and inverse optimal control to learn task specific visual scanpaths. In NIPS, 2013.

26 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue