publication . Preprint . 2016

Deep Cuboid Detection: Beyond 2D Bounding Boxes

Dwibedi, Debidatta; Malisiewicz, Tomasz; Badrinarayanan, Vijay; Rabinovich, Andrew;
Open Access English
  • Published: 30 Nov 2016
We present a Deep Cuboid Detector which takes a consumer-quality RGB image of a cluttered scene and localizes all 3D cuboids (box-like objects). Contrary to classical approaches which fit a 3D model from low-level cues like corners, edges, and vanishing points, we propose an end-to-end deep learning system to detect cuboids across many semantic categories (e.g., ovens, shipping boxes, and furniture). We localize cuboids with a 2D bounding box, and simultaneously localize the cuboid's corners, effectively producing a 3D interpretation of box-like objects. We refine keypoints by pooling convolutional features iteratively, improving the baseline method significantl...
free text keywords: Computer Science - Computer Vision and Pattern Recognition
Download from
57 references, page 1 of 4

[1] M. Aubry, D. Maturana, A. A. Efros, B. C. Russell, and J. Sivic. Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3762-3769, 2014. 3

[2] A. Bansal, B. Russell, and A. Gupta. Marr revisited: 2d-3d alignment via surface normal prediction. In CVPR, 2016. 2, 3 [OpenAIRE]

[3] V. Belagiannis and A. Zisserman. Recurrent human pose estimation. arXiv preprint arXiv:1605.02914, 2016. 3 [OpenAIRE]

[4] S. Bell, C. L. Zitnick, K. Bala, and R. Girshick. Insideoutside net: Detecting objects in context with skip pooling and recurrent neural networks. In CVPR, 2016. 3

[5] I. Biederman. Recognition-by-components: a theory of human image understanding. Psychological review, 94(2):115, 1987. 1, 3 [OpenAIRE]

[6] A. Bulat and G. Tzimiropoulos. Human pose estimation via convolutional part heatmap regression. In ECCV, pages 717- 732. Springer, 2016. 3

[7] J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik. Human pose estimation with iterative error feedback. In CVPR, 2016. 3

[8] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531, 2014. 5

[9] C. B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese. 3dr2n2: A unified approach for single and multi-view 3d object reconstruction. In ECCV, 2016. 3

[10] A. Collet, M. Martinez, and S. S. Srinivasa. The moped framework: Object recognition and pose estimation for manipulation. The International Journal of Robotics Research, 30(10):1284-1306, 2011. 3

[11] A. Crivellaro, M. Rad, Y. Verdie, K. Moo Yi, P. Fua, and V. Lepetit. A novel representation of parts for accurate 3d object detection and tracking in monocular images. In Proceedings of the IEEE International Conference on Computer Vision, pages 4391-4399, 2015. 3

[12] J. Dai, K. He, and J. Sun. Instance-aware semantic segmentation via multi-task network cascades. In CVPR, 2016. 3

[13] D. DeTone, T. Malisiewicz, and A. Rabinovich. Deep image homography estimation. arXiv preprint arXiv:1606.03798, 2016. 9

[14] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303- 338, 2010. 6 [OpenAIRE]

[15] S. Fidler, S. Dickinson, and R. Urtasun. 3d object detection and viewpoint estimation with a deformable 3d cuboid model. In Advances in neural information processing systems, pages 611-619, 2012. 3

57 references, page 1 of 4
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue