Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui, “Consumer video understanding: A benchmark database and an evaluation of human and machine performance,” in ICMR, 2011. [OpenAIRE]
 J. M. Chaquet, E. J. Carmona, and A. Ferna´ndez-Caballero, “A survey of video datasets for human action and activity recognition,” Comput. Vis. Image Underst., vol. 117, no. 6, pp. 633-659, 2013.
 A. F. Smeaton, P. Over, and W. Kraaij, “Evaluation campaigns and trecvid,” in MIR, 2006.
 I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies,” in CVPR, 2008. [OpenAIRE]
 J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009.
 J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba, “Sun database: Large-scale scene recognition from abbey to zoo,” in CVPR, 2010. [OpenAIRE]
 P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proceedings of the 15th International Conference on Multimedia, 2007.
 A. Kla¨ser, M. Marszałek, and C. Schmid, “A spatio-temporal descriptor based on 3d-gradients,” in BMVC, 2008.
 T. Deselaers, S. Hasan, O. Bender, and H. Ney, “A deep learning approach to machine transliteration,” in Proceedings of the Fourth Workshop on Statistical Machine Translation, 2009.
 A. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” Trans. Audio, Speech and Lang. Proc., vol. 20, no. 1, pp. 14-22, 2012.
 A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2012.
 Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng, “Building high-level features using large scale unsupervised learning,” in ICML, 2012. [OpenAIRE]
 J. Sivic and A. Zisserman, “Video google: a text retrieval approach to object matching in videos,” in ICCV, 2003. [OpenAIRE]
 F. Perronnin et al., “Improving the fisher kernel for large-scale image classification,” in ECCV, 2010. [OpenAIRE]
 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.