publication . Article . Other literature type . 2018

Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos With Per-Frame Segmentation

Nanning Zheng; Le Wang; Zhenxing Niu; Gang Hua; Qilin Zhang; Xuhuan Duan;
Open Access English
  • Published: 22 May 2018 Journal: Sensors (Basel, Switzerland), volume 18, issue 5 (eissn: 1424-8220, Copyright policy)
  • Publisher: MDPI
Abstract
Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bounding boxes), we present a new spatio-temporal action localization detector Segment-tube, which consists of sequences of per-frame segmentation masks. The proposed Segment-tube detector can temporally pinpoint the starting/ending frame of each action category in the presence of preceding/subsequent interference actions in untrimmed videos. Simultaneously, the Segment-tube detector produces per-frame segmentation masks instead of bounding boxes, offering superior spatial accuracy to tubelets. This is achieved by alternating iterative optimization between temporal act...
Subjects
ACM Computing Classification System: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
free text keywords: Article, action localization, action segmentation, 3D ConvNets, LSTM, Chemical technology, TP1-1185, Electrical and Electronic Engineering, Analytical Chemistry, Atomic and Molecular Physics, and Optics, Biochemistry, Interference (wave propagation), Segmentation, Pattern recognition, Detector, Artificial intelligence, business.industry, business, Bounding overwatch, Computer science
Download fromView all 5 versions
Sensors
Article . 2018
Sensors
Article . 2018
Provider: Crossref
Sensors
Article
Provider: UnpayWall
61 references, page 1 of 5

Wang, L., Qiao, Y., Tang, X.. Action recognition and detection by combining motion and appearance features. THUMOS14 Action Recognit. Chall.. 2014; 1: 2

Simonyan, K., Zisserman, A.. Two-stream convolutional networks for action recognition in videos. Proceedings of the Advances in Neural Information Processing Systems. : 568-576

Wang, L., Qiao, Y., Tang, X.. Action recognition with trajectory-pooled deep-convolutional descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 4305-4314

Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.. Learning spatiotemporal features with 3D convolutional networks. Proceedings of the IEEE International Conference on Computer Vision. : 4489-4497

Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.. Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 2625-2634

Weinzaepfel, P., Harchaoui, Z., Schmid, C.. Learning to track for spatio-temporal action localization. Proceedings of the IEEE International Conference on Computer Vision. : 3164-3172

Ma, S., Sigal, L., Sclaroff, S.. Learning activity progression in lstms for activity detection and early detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 1942-1950

Montes, A., Salvador, A., Pascual, S., Giro-i Nieto, X.. Temporal activity detection in untrimmed videos with recurrent neural networks. Proceedings of the 1st NIPS Workshop on Large Scale Computer Vision Systems.

Caba Heilbron, F., Carlos Niebles, J., Ghanem, B.. Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 1914-1923

Richard, A., Gall, J.. Temporal action detection using a statistical language model. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 3131-3140

Shou, Z., Wang, D., Chang, S.F.. Temporal action localization in untrimmed videos via multi-stage cnns. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 1049-1058

Kalogeiton, V., Weinzaepfel, P., Ferrari, V., Schmid, C.. Action tubelet detector for spatio-temporal action localization. Proceedings of the IEEE International Conference on Computer Vision. : 4405-4413

Wang, Y., Long, M., Wang, J., Yu, P.S.. Spatiotemporal pyramid network for video action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 2097-2106

Girdhar, R., Ramanan, D., Gupta, A., Sivic, J., Russell, B.. ActionVLAD: Learning spatio-temporal aggregation for action classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 971-980

Yuan, Z., Stroud, J.C., Lu, T., Deng, J.. Temporal action localization by structured maximal sums. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 3215-3223

61 references, page 1 of 5
Abstract
Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bounding boxes), we present a new spatio-temporal action localization detector Segment-tube, which consists of sequences of per-frame segmentation masks. The proposed Segment-tube detector can temporally pinpoint the starting/ending frame of each action category in the presence of preceding/subsequent interference actions in untrimmed videos. Simultaneously, the Segment-tube detector produces per-frame segmentation masks instead of bounding boxes, offering superior spatial accuracy to tubelets. This is achieved by alternating iterative optimization between temporal act...
Subjects
ACM Computing Classification System: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
free text keywords: Article, action localization, action segmentation, 3D ConvNets, LSTM, Chemical technology, TP1-1185, Electrical and Electronic Engineering, Analytical Chemistry, Atomic and Molecular Physics, and Optics, Biochemistry, Interference (wave propagation), Segmentation, Pattern recognition, Detector, Artificial intelligence, business.industry, business, Bounding overwatch, Computer science
Download fromView all 5 versions
Sensors
Article . 2018
Sensors
Article . 2018
Provider: Crossref
Sensors
Article
Provider: UnpayWall
61 references, page 1 of 5

Wang, L., Qiao, Y., Tang, X.. Action recognition and detection by combining motion and appearance features. THUMOS14 Action Recognit. Chall.. 2014; 1: 2

Simonyan, K., Zisserman, A.. Two-stream convolutional networks for action recognition in videos. Proceedings of the Advances in Neural Information Processing Systems. : 568-576

Wang, L., Qiao, Y., Tang, X.. Action recognition with trajectory-pooled deep-convolutional descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 4305-4314

Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.. Learning spatiotemporal features with 3D convolutional networks. Proceedings of the IEEE International Conference on Computer Vision. : 4489-4497

Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.. Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 2625-2634

Weinzaepfel, P., Harchaoui, Z., Schmid, C.. Learning to track for spatio-temporal action localization. Proceedings of the IEEE International Conference on Computer Vision. : 3164-3172

Ma, S., Sigal, L., Sclaroff, S.. Learning activity progression in lstms for activity detection and early detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 1942-1950

Montes, A., Salvador, A., Pascual, S., Giro-i Nieto, X.. Temporal activity detection in untrimmed videos with recurrent neural networks. Proceedings of the 1st NIPS Workshop on Large Scale Computer Vision Systems.

Caba Heilbron, F., Carlos Niebles, J., Ghanem, B.. Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 1914-1923

Richard, A., Gall, J.. Temporal action detection using a statistical language model. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 3131-3140

Shou, Z., Wang, D., Chang, S.F.. Temporal action localization in untrimmed videos via multi-stage cnns. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 1049-1058

Kalogeiton, V., Weinzaepfel, P., Ferrari, V., Schmid, C.. Action tubelet detector for spatio-temporal action localization. Proceedings of the IEEE International Conference on Computer Vision. : 4405-4413

Wang, Y., Long, M., Wang, J., Yu, P.S.. Spatiotemporal pyramid network for video action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 2097-2106

Girdhar, R., Ramanan, D., Gupta, A., Sivic, J., Russell, B.. ActionVLAD: Learning spatio-temporal aggregation for action classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 971-980

Yuan, Z., Stroud, J.C., Lu, T., Deng, J.. Temporal action localization by structured maximal sums. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. : 3215-3223

61 references, page 1 of 5
Any information missing or wrong?Report an Issue