Localizing spatially and temporally objects and actions in videos

Name: Localizing spatially and temporally objects and actions in videos
Creator: Kalogeiton, Vicky
Keywords: machine learning, action recognition, [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], video analysis, deep learning, object detection, action localization, Localisation, computer vision

descriptionPublicationkeyboard_double_arrow_right Doctoral thesis 01 Jan 2017 English

Authors: Kalogeiton, Vicky;

Localizing spatially and temporally objects and actions in videos

- Summary
- Subjects
- Related research
  (6)
- Metrics

Abstract

The rise of deep learning has facilitated remarkable progress in video understanding. This thesis addresses three important tasks of video understanding: video object detection, joint object and action detection, and spatio-temporal action localization.Object class detection is one of the most important challenges in computer vision. Object detectors are usually trained on bounding-boxes from still images. Recently, video has been used as an alternative source of data. Yet, training an object detector on one domain (either still images or videos) and testing on the other one results in a significant performance gap compared to training and testing on the same domain. In the first part of this thesis, we examine the reasons behind this performance gap. We define and evaluate several domain shift factors: spatial location accuracy, appearance diversity, image quality, aspect distribution, and object size and camera framing. We examine the impact of these factors by comparing the detection performance before and after cancelling them out. The results show that all five factors affect the performance of the detectors and their combined effect explains the performance gap.While most existing approaches for detection in videos focus on objects or human actions separately, in the second part of this thesis we aim at detecting non-human centric actions, i.e., objects performing actions, such as cat eating or dog jumping. We introduce an end-to-end multitask objective that jointly learns object-action relationships. We compare it with different training objectives, validate its effectiveness for detecting object-action pairs in videos, and show that both tasks of object and action detection benefit from this joint learning. In experiments on the A2D dataset, we obtain state-of-the-art results on segmentation of object-action pairs.In the third part, we are the first to propose an action tubelet detector that leverages the temporal continuity of videos instead of operating at the frame level, as state-of- the-art approaches do. The same way modern detectors rely on anchor boxes, our tubelet detector is based on anchor cuboids by taking as input a sequence of frames and outputing tubelets, i.e., sequences of bounding boxes with associated scores. Our tubelet detector outperforms all state of the art on the UCF-Sports, J-HMDB, and UCF-101 action localization datasets especially at high overlap thresholds. The improvement in detection performance is explained by both more accurate scores and more precise localization.

Localiser spatio-temporallement des objets et des actions dans des vidéos

Related Organizations

Grenoble Alpes University
France
French National Centre for Scientific Research
France
French Institute for Research in Computer Science and Automation
France
Laboratoire Jean Kuntzmann
France
Grenoble INP - UGA
France

View all View all

Keywords

machine learning, action recognition, [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], video analysis, deep learning, object detection, action localization, Localisation, computer vision

6 Research products, page 1 of 1

Convolution Encoders for End-to-End Action Tracking With Space-Time Cubic Kernels
2020IsAmongTopNSimilarDocuments
P3D-CTN: Pseudo-3D Convolutional Tube Network for Spatio-Temporal Action Detection in Videos
2019IsAmongTopNSimilarDocuments
Three-Stream Action Tubelet Detector for Spatiotemporal Action Detection in Videos
2018IsAmongTopNSimilarDocuments
Detecting action tubes via spatial action estimation and temporal path inference
2018IsAmongTopNSimilarDocuments
Recurrent Tubelet Proposal and Recognition Networks for Action Detection
2018IsAmongTopNSimilarDocuments
Learning to Track for Spatio-Temporal Action Localization
2015IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Related to Research communities

INRIA

UArctic

University Network for Innovation, Technology and Engineering

Localizing spatially and temporally objects and actions in videos

Localizing spatially and temporally objects and actions in videos

6 Research products, page 1 of 1

Convolution Encoders for End-to-End Action Tracking With Space-Time Cubic Kernels

P3D-CTN: Pseudo-3D Convolutional Tube Network for Spatio-Temporal Action Detection in Videos

Three-Stream Action Tubelet Detector for Spatiotemporal Action Detection in Videos

Detecting action tubes via spatial action estimation and temporal path inference

Recurrent Tubelet Proposal and Recognition Networks for Action Detection

Learning to Track for Spatio-Temporal Action Localization