Learning from Multiple Sources for Video Summarisation

Preprint English OPEN
Zhu, Xiatian ; Loy, Chen Change ; Gong, Shaogang (2015)
  • Subject: Computer Science - Computer Vision and Pattern Recognition

Many visual surveillance tasks, e.g.video summarisation, is conventionally accomplished through analysing imagerybased features. Relying solely on visual cues for public surveillance video understanding is unreliable, since visual observations obtained from public space CCTV video data are often not sufficiently trustworthy and events of interest can be subtle. On the other hand, non-visual data sources such as weather reports and traffic sensory signals are readily accessible but are not explored jointly to complement visual data for video content analysis and summarisation. In this paper, we present a novel unsupervised framework to learn jointly from both visual and independently-drawn non-visual data sources for discovering meaningful latent structure of surveillance video data. In particular, we investigate ways to cope with discrepant dimension and representation whist associating these heterogeneous data sources, and derive effective mechanism to tolerate with missing and incomplete data from different sources. We show that the proposed multi-source learning framework not only achieves better video content clustering than state-of-the-art methods, but also is capable of accurately inferring missing non-visual semantics from previously unseen videos. In addition, a comprehensive user study is conducted to validate the quality of video summarisation generated using the proposed multi-source model.
  • References (56)
    56 references, page 1 of 6

    [1] T. M. Hospedales, J. Li, S. Gong, and T. Xiang, “Identifying rare and subtle behaviors: a weakly supervised joint topic model,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 2451- 2464, 2011.

    [2] X. Wang, X. Ma, and W. E. L. Grimson, “Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 539-555, 2009.

    [3] S. Wu, B. E. Moore, and M. Shah, “Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes,” in IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 2054-2060.

    [4] C. C. Loy, T. Xiang, and S. Gong, “Incremental activity modeling in multiple disjoint cameras,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 1799-1813, 2012.

    [5] L. Kratz and K. Nishino, “Going with the flow: pedestrian efficiency in crowded scenes,” in European Conference on Computer Vision, 2012.

    [6] A. B. Chan and N. Vasconcelos, “Modeling, clustering, and segmenting video with mixtures of dynamic textures,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 909-926, 2008.

    [7] W. Li, V. Mahadevan, and N. Vasconcelos, “Anomaly detection and localization in crowded scenes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.

    [8] H. Kang, X. Chen, Y. Matsushita, and X. Tang, “Space-time video montage,” in IEEE Conference on Computer Vision and Pattern Recognition, 2006.

    [9] Y. Pritch, A. Rav-Acha, and S. Peleg, “Nonchronological video synopsis and indexing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1971-1984, 2008.

    [10] S. Feng, Z. Lei, D. Yi, and S. Z. Li, “Online content-aware video condensation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2012.

  • Metrics
    No metrics available
Share - Bookmark