publication . Preprint . 2019

Siam R-CNN: Visual Tracking by Re-Detection

Voigtlaender, Paul; Luiten, Jonathon; Torr, Philip H. S.; Leibe, Bastian;
Open Access English
  • Published: 28 Nov 2019
Abstract
We present Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking. We combine this with a novel tracklet-based dynamic programming algorithm, which takes advantage of re-detections of both the first-frame template and previous-frame predictions, to model the full history of both the object to be tracked and potential distractor objects. This enables our approach to make better tracking decisions, as well as to re-detect tracked objects after long occlusion. Finally, we propose a novel hard example mining strategy to improve Siam R-CNN's robustness to similar looking objec...
Subjects
free text keywords: Computer Science - Computer Vision and Pattern Recognition
Funded by
EC| DeeViSe
Project
DeeViSe
Deep Learning for Dynamic 3D Visual Scene Understanding
  • Funder: European Commission (EC)
  • Project Code: 773161
  • Funding stream: H2020 | ERC | ERC-COG
,
RCUK| Understanding scenes and events through joint parsing, cognitive reasoning and lifelong learning
Project
  • Funder: Research Council UK (RCUK)
  • Project Code: EP/N019474/1
  • Funding stream: EPSRC
Download from
119 references, page 1 of 8

[1] S. Avidan. Support vector tracking. PAMI, 2004. 1, 2 [OpenAIRE]

[2] Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. Robust object tracking with online multiple instance learning. PAMI, 2011. 1, 2

[3] Linchao Bao, Baoyuan Wu, and Wei Liu. CNN in MRF: video object segmentation via inference in a cnn-based higher-order spatio-temporal MRF. In CVPR, 2018. 2, 7, 16, 17

[4] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr. Fully-convolutional siamese networks for object tracking. In ECCVW, 2016. 1, 7, 16

[5] Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. Learning discriminative model prediction for tracking. In ICCV, 2019. 2, 6, 7, 8, 16, 18

[6] Goutam Bhat, Joakim Johnander, Martin Danelljan, Fahad Shahbaz Khan, and Michael Felsberg. Unveiling the power of deep tracking. In ECCV, 2018. 6, 16, 18, 19

[7] David S. Bolme, J. Ross Beveridge, Bruce A. Draper, and Yui Man Lui. Visual object tracking using adaptive correlation filters. In CVPR, 2010. 2

[8] S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixe´, D. Cremers, and L. Van Gool. One-shot video object segmentation. In CVPR, 2017. 2, 7, 8, 16, 17

[9] Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. In CVPR, 2018. 3, 5

[10] Boyu Chen, Dong Wang, Peixia Li, Shuang Wang, and Huchuan Lu. Real-time 'actor-critic' tracking. In ECCV, 2018. 15, 16, 18

[11] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018. 5

[12] Y. Chen, J. Pont-Tuset, A. Montes, and L. Van Gool. Blazingly fast video object segmentation with pixel-wise metric learning. In CVPR, 2018. 2, 17

[13] J. Cheng, Y-H. Tsai, W-C. Hung, S. Wang, and M-H. Yang. Fast and accurate online video object segmentation via tracking parts. In CVPR, 2018. 2, 7, 16, 17

[14] Jongwon Choi, Hyung Jin Chang, Tobias Fischer, Sangdoo Yun, Kyuewang Lee, Jiyeoup Jeong, Yiannis Demiris, and Jin Young Choi. Context-aware deep feature compression for high-speed visual tracking. In CVPR, 2018. 18

[15] Janghoon Choi, Junseok Kwon, and Kyoung Mu Lee. Deep meta learning for real-time target-aware visual tracking. In ICCV, 2019. 18

119 references, page 1 of 8
Abstract
We present Siam R-CNN, a Siamese re-detection architecture which unleashes the full power of two-stage object detection approaches for visual object tracking. We combine this with a novel tracklet-based dynamic programming algorithm, which takes advantage of re-detections of both the first-frame template and previous-frame predictions, to model the full history of both the object to be tracked and potential distractor objects. This enables our approach to make better tracking decisions, as well as to re-detect tracked objects after long occlusion. Finally, we propose a novel hard example mining strategy to improve Siam R-CNN's robustness to similar looking objec...
Subjects
free text keywords: Computer Science - Computer Vision and Pattern Recognition
Funded by
EC| DeeViSe
Project
DeeViSe
Deep Learning for Dynamic 3D Visual Scene Understanding
  • Funder: European Commission (EC)
  • Project Code: 773161
  • Funding stream: H2020 | ERC | ERC-COG
,
RCUK| Understanding scenes and events through joint parsing, cognitive reasoning and lifelong learning
Project
  • Funder: Research Council UK (RCUK)
  • Project Code: EP/N019474/1
  • Funding stream: EPSRC
Download from
119 references, page 1 of 8

[1] S. Avidan. Support vector tracking. PAMI, 2004. 1, 2 [OpenAIRE]

[2] Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. Robust object tracking with online multiple instance learning. PAMI, 2011. 1, 2

[3] Linchao Bao, Baoyuan Wu, and Wei Liu. CNN in MRF: video object segmentation via inference in a cnn-based higher-order spatio-temporal MRF. In CVPR, 2018. 2, 7, 16, 17

[4] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr. Fully-convolutional siamese networks for object tracking. In ECCVW, 2016. 1, 7, 16

[5] Goutam Bhat, Martin Danelljan, Luc Van Gool, and Radu Timofte. Learning discriminative model prediction for tracking. In ICCV, 2019. 2, 6, 7, 8, 16, 18

[6] Goutam Bhat, Joakim Johnander, Martin Danelljan, Fahad Shahbaz Khan, and Michael Felsberg. Unveiling the power of deep tracking. In ECCV, 2018. 6, 16, 18, 19

[7] David S. Bolme, J. Ross Beveridge, Bruce A. Draper, and Yui Man Lui. Visual object tracking using adaptive correlation filters. In CVPR, 2010. 2

[8] S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixe´, D. Cremers, and L. Van Gool. One-shot video object segmentation. In CVPR, 2017. 2, 7, 8, 16, 17

[9] Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. In CVPR, 2018. 3, 5

[10] Boyu Chen, Dong Wang, Peixia Li, Shuang Wang, and Huchuan Lu. Real-time 'actor-critic' tracking. In ECCV, 2018. 15, 16, 18

[11] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018. 5

[12] Y. Chen, J. Pont-Tuset, A. Montes, and L. Van Gool. Blazingly fast video object segmentation with pixel-wise metric learning. In CVPR, 2018. 2, 17

[13] J. Cheng, Y-H. Tsai, W-C. Hung, S. Wang, and M-H. Yang. Fast and accurate online video object segmentation via tracking parts. In CVPR, 2018. 2, 7, 16, 17

[14] Jongwon Choi, Hyung Jin Chang, Tobias Fischer, Sangdoo Yun, Kyuewang Lee, Jiyeoup Jeong, Yiannis Demiris, and Jin Young Choi. Context-aware deep feature compression for high-speed visual tracking. In CVPR, 2018. 18

[15] Janghoon Choi, Junseok Kwon, and Kyoung Mu Lee. Deep meta learning for real-time target-aware visual tracking. In ICCV, 2019. 18

119 references, page 1 of 8
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue