Learning human actions by combining global dynamics and local appearance

Article English OPEN
Luo, G. ; Yang, S. ; Tian, G. ; Yuan, C. ; Hu, W. ; Maybank, Stephen J. (2014)
  • Publisher: IEEE Computer Society
  • Related identifiers: doi: 10.1109/TPAMI.2014.2329301
  • Subject: csis

In this paper, we address the problem of human action recognition through combining global temporal dynamics and local visual spatio-temporal appearance features. For this purpose, in the global temporal dimension, we propose to model the motion dynamics with robust linear dynamical systems (LDSs) and use the model parameters as motion descriptors. Since LDSs live in a non-Euclidean space and the descriptors are in non-vector form, we propose a shift invariant subspace angles based distance to measure the similarity between LDSs. In the local visual dimension, we construct curved spatio-temporal cuboids along the trajectories of densely sampled feature points and describe them using histograms of oriented gradients (HOG). The distance between motion sequences is computed with the Chi-Squared histogram distance in the bag-of-words framework. Finally we perform classification using the maximum margin distance learning method by combining the global dynamic distances and the local visual distances. We evaluate our approach for action recognition on five short clips data sets, namely Weizmann, KTH, UCF sports, Hollywood2 and UCF50, as well as three long continuous data sets, namely VIRAT, ADL and CRIM13. We show competitive results as compared with current state-of-the-art methods.
  • References (66)
    66 references, page 1 of 7

    [1] P. Turaga, R. Chellappa, V. S. Subrahmanian, and O. Udrea, “Machine recognition of human activities: A survey,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 11, pp. 1473-1488, 2008.

    [2] R. Poppe, “A survey on vision-based human action recognition,” Image and Vision Computing, vol. 28, no. 6, pp. 976-990, 2010.

    [3] I. Laptev, “On space-time interest points,” Int'l J. Computer Vision, vol. 64, no. 2-3, pp. 107-123, 2005.

    [4] P. Dolla´r, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatiotemporal features,” in Proc. IEEE Int'l Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65-72.

    [5] P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional sift descriptor and its application to action recognition,” in Proc. Int'l Conf. Multimedia, 2007, pp. 357-360.

    [6] J. Niebles, H. Wang, and L. Fei-Fei, “Unsupervised learning of human action categories using spatial temporal words,” Int'l J. Computer Vision, vol. 79, no. 3, pp. 299-318, 2008.

    [7] A. Efros, A. Berg, G. Mori, and J. Malik, “Recognizing action at a distance,” in Proc. IEEE Int'l Conf. Computer Vision, 2003, pp. 726-733.

    [8] D. Tran and A. Sorokin, “Human activity recognition with metric learning,” in Proc. European Conf. on Computer Vision, 2008.

    [9] R. Chaudhry, A. Ravichandran, G. Hager, and R. Vidal, “Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2009, pp. 1932-1939.

    [10] H. Wang, A. Kla¨ser, C. Schmid, and C. L. Liu, “Dense trajectories and motion boundary descriptors for action recognition,” Int'l J. Computer Vision, vol. 103, no. 1, pp. 60-79, 2013.

  • Similar Research Results (2)
  • Metrics
    views in OpenAIRE
    views in local repository
    downloads in local repository

    The information is available from the following content providers:

    From Number Of Views Number Of Downloads
    Birkbeck Institutional Research Online - IRUS-UK 0 64
Share - Bookmark