publication . Preprint . 2014

Transfer Learning for Video Recognition with Scarce Training Data for Deep Convolutional Neural Network

Su, Yu-Chuan; Chiu, Tzu-Hsuan; Yeh, Chun-Yen; Huang, Hsin-Fu; Hsu, Winston H.;
Open Access English
  • Published: 14 Sep 2014
Unconstrained video recognition and Deep Convolution Network (DCN) are two active topics in computer vision recently. In this work, we apply DCNs as frame-based recognizers for video recognition. Our preliminary studies, however, show that video corpora with complete ground truth are usually not large and diverse enough to learn a robust model. The networks trained directly on the video data set suffer from significant overfitting and have poor recognition rate on the test set. The same lack-of-training-sample problem limits the usage of deep models on a wide range of computer vision problems where obtaining training data are difficult. To overcome the problem, ...
ACM Computing Classification System: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
free text keywords: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Learning
Download from
33 references, page 1 of 3

[1] Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui, “Consumer video understanding: A benchmark database and an evaluation of human and machine performance,” in ICMR, 2011. [OpenAIRE]

[2] J. M. Chaquet, E. J. Carmona, and A. Ferna´ndez-Caballero, “A survey of video datasets for human action and activity recognition,” Comput. Vis. Image Underst., vol. 117, no. 6, pp. 633-659, 2013.

[3] A. F. Smeaton, P. Over, and W. Kraaij, “Evaluation campaigns and trecvid,” in MIR, 2006.

[4] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies,” in CVPR, 2008. [OpenAIRE]

[5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009.

[6] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba, “Sun database: Large-scale scene recognition from abbey to zoo,” in CVPR, 2010. [OpenAIRE]

[7] P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proceedings of the 15th International Conference on Multimedia, 2007.

[8] A. Kla¨ser, M. Marszałek, and C. Schmid, “A spatio-temporal descriptor based on 3d-gradients,” in BMVC, 2008.

[9] T. Deselaers, S. Hasan, O. Bender, and H. Ney, “A deep learning approach to machine transliteration,” in Proceedings of the Fourth Workshop on Statistical Machine Translation, 2009.

[10] A. Mohamed, G. E. Dahl, and G. Hinton, “Acoustic modeling using deep belief networks,” Trans. Audio, Speech and Lang. Proc., vol. 20, no. 1, pp. 14-22, 2012.

[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2012.

[12] Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng, “Building high-level features using large scale unsupervised learning,” in ICML, 2012. [OpenAIRE]

[13] J. Sivic and A. Zisserman, “Video google: a text retrieval approach to object matching in videos,” in ICCV, 2003. [OpenAIRE]

[14] F. Perronnin et al., “Improving the fisher kernel for large-scale image classification,” in ECCV, 2010. [OpenAIRE]

[15] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.

33 references, page 1 of 3
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue