publication . Conference object . Preprint . 2017

Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer

Komodakis, Nikos; Zagoruyko, Sergey;
Open Access English
  • Published: 01 Jun 2017
  • Publisher: HAL CCSD
  • Country: France
International audience; Attention plays a critical role in human visual experience. Furthermore, it has recently been demonstrated that attention can also play an important role in the context of applying artificial neural networks to a variety of tasks from fields such as computer vision and NLP. In this work we show that, by properly defining attention for convolutional neural networks, we can actually use this type of information in order to significantly improve the performance of a student CNN network by forcing it to mimic the attention maps of a powerful teacher network. To that end, we propose several novel methods of transferring attention, showing cons...
free text keywords: [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Computer Science - Computer Vision and Pattern Recognition
19 references, page 1 of 2

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473, 2014. URL abs/1409.0473.

Cristian Bucila, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In KDD, pp. 535-541, 2006.

Taco S. Cohen and Max Welling. Group equivariant convolutional networks. abs/1602.07576, 2016. URL [OpenAIRE]

Volodymyr Mnih, Nicolas Heess, Alex Graves, and koray kavukcuoglu. Recurrent models of visual attention. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (eds.), Advances in Neural Information Processing Systems 27, pp. 2204-2212. Curran Associates, Inc., 2014. URL 5542-recurrent-models-of-visual-attention.pdf. [OpenAIRE]

M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Is object localization for free? weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. [OpenAIRE]

O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In British Machine Vision Conference, 2015.

A. Quattoni and A. Torralba. Recognizing indoor scenes. In CVPR, 2009. [OpenAIRE]

Ronald A. Rensink. The dynamic representation of scenes. In Visual Cognition, pp. 17-42, 2000. [OpenAIRE]

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. FitNets: Hints for thin deep nets. Technical Report Arxiv report 1412.6550, arXiv, 2014.

Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. 2016. [OpenAIRE]

Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. In ICLR Workshop, 2014.

J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. In arXiv:1412.6806, also appeared at ICLR 2015 Workshop Track, 2015. URL

Rupesh Kumar Srivastava, Klaus Greff, and Ju¨rgen Schmidhuber. Highway networks. CoRR, abs/1505.00387, 2015.

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. CoRR, abs/1502.03044, 2015. URL 03044.

19 references, page 1 of 2
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue