publication . Preprint . Conference object . 2018

SurfConv: Bridging 3D and 2D Convolution for RGBD Images

Hang Chu; Wei-Chiu Ma; Kaustav Kundu; Raquel Urtasun; Sanja Fidler;
Open Access English
  • Published: 04 Dec 2018
Abstract
We tackle the problem of using 3D information in convolutional neural networks for down-stream recognition tasks. Using depth as an additional channel alongside the RGB input has the scale variance problem present in image convolution based approaches. On the other hand, 3D convolution wastes a large amount of memory on mostly unoccupied 3D space, which consists of only the surface visible to the sensor. Instead, we propose SurfConv, which "slides" compact 2D filters along the visible 3D surface. SurfConv is formulated as a simple depth-aware multi-scale 2D convolution, through a new Data-Driven Depth Discretization (D4) scheme. We demonstrate the effectiveness ...
Subjects
ACM Computing Classification System: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
free text keywords: Computer Science - Computer Vision and Pattern Recognition, RGB color model, Image sensor, Segmentation, Artificial intelligence, business.industry, business, 2D Filters, Convolutional neural network, Kernel (image processing), Computer science, Convolution, Pattern recognition, Computer vision, Discretization
57 references, page 1 of 4

[1] D. Boscaini, J. Masci, E. Rodola`, and M. Bronstein. Learning shape correspondence with anisotropic convolutional neural networks. In NIPS, 2016. 2

[2] L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille. Attention to scale: Scale-aware semantic image segmentation. In CVPR, 2016. 3 [OpenAIRE]

[3] X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun. 3d object proposals using stereo imagery for accurate object class detection. TPAMI, 2017. 2

[4] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia. Multi-view 3d object detection network for autonomous driving. In CVPR, 2017. 2

[5] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. CVPR, 2017. 2

[6] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. Deformable convolutional networks. In ICCV, 2017. 3, 5, 6, 7

[7] Z. Deng and L. J. Latecki. Amodal detection of 3d objects: Inferring 3d bounding boxes from 2d ones in rgb-depth images. In CVPR, 2017. 2

[8] M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner. Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks. In ICRA, 2017. 2, 3

[9] Y. Fang, J. Xie, G. Dai, M. Wang, F. Zhu, T. Xu, and E. Wong. 3d deep shape descriptor. In CVPR, 2015. 1, 2

[10] L. Ge, H. Liang, J. Yuan, and D. Thalmann. 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In CVPR, 2017. 2

[11] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The kitti dataset. IJRR, 32(11):1231-1237, 2013. 1, 2, 5, 6, 7

[12] K. Guo, D. Zou, and X. Chen. 3d mesh labeling via deep convolutional neural networks. TOG, 35(1):3, 2015. 2

[13] S. Gupta, R. Girshick, P. Arbela´ez, and J. Malik. Learning rich features from rgb-d images for object detection and segmentation. In ECCV, 2014. 1, 2, 5, 6

[14] S. Gupta, J. Hoffman, and J. Malik. Cross modal distillation for supervision transfer. In CVPR, 2016. 5

[15] Z. Hao, Y. Liu, H. Qin, J. Yan, X. Li, and X. Hu. Scale-aware face detection. In CVPR, 2017. 3

57 references, page 1 of 4
Abstract
We tackle the problem of using 3D information in convolutional neural networks for down-stream recognition tasks. Using depth as an additional channel alongside the RGB input has the scale variance problem present in image convolution based approaches. On the other hand, 3D convolution wastes a large amount of memory on mostly unoccupied 3D space, which consists of only the surface visible to the sensor. Instead, we propose SurfConv, which "slides" compact 2D filters along the visible 3D surface. SurfConv is formulated as a simple depth-aware multi-scale 2D convolution, through a new Data-Driven Depth Discretization (D4) scheme. We demonstrate the effectiveness ...
Subjects
ACM Computing Classification System: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
free text keywords: Computer Science - Computer Vision and Pattern Recognition, RGB color model, Image sensor, Segmentation, Artificial intelligence, business.industry, business, 2D Filters, Convolutional neural network, Kernel (image processing), Computer science, Convolution, Pattern recognition, Computer vision, Discretization
57 references, page 1 of 4

[1] D. Boscaini, J. Masci, E. Rodola`, and M. Bronstein. Learning shape correspondence with anisotropic convolutional neural networks. In NIPS, 2016. 2

[2] L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille. Attention to scale: Scale-aware semantic image segmentation. In CVPR, 2016. 3 [OpenAIRE]

[3] X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun. 3d object proposals using stereo imagery for accurate object class detection. TPAMI, 2017. 2

[4] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia. Multi-view 3d object detection network for autonomous driving. In CVPR, 2017. 2

[5] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. CVPR, 2017. 2

[6] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. Deformable convolutional networks. In ICCV, 2017. 3, 5, 6, 7

[7] Z. Deng and L. J. Latecki. Amodal detection of 3d objects: Inferring 3d bounding boxes from 2d ones in rgb-depth images. In CVPR, 2017. 2

[8] M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner. Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks. In ICRA, 2017. 2, 3

[9] Y. Fang, J. Xie, G. Dai, M. Wang, F. Zhu, T. Xu, and E. Wong. 3d deep shape descriptor. In CVPR, 2015. 1, 2

[10] L. Ge, H. Liang, J. Yuan, and D. Thalmann. 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In CVPR, 2017. 2

[11] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun. Vision meets robotics: The kitti dataset. IJRR, 32(11):1231-1237, 2013. 1, 2, 5, 6, 7

[12] K. Guo, D. Zou, and X. Chen. 3d mesh labeling via deep convolutional neural networks. TOG, 35(1):3, 2015. 2

[13] S. Gupta, R. Girshick, P. Arbela´ez, and J. Malik. Learning rich features from rgb-d images for object detection and segmentation. In ECCV, 2014. 1, 2, 5, 6

[14] S. Gupta, J. Hoffman, and J. Malik. Cross modal distillation for supervision transfer. In CVPR, 2016. 5

[15] Z. Hao, Y. Liu, H. Qin, J. Yan, X. Li, and X. Hu. Scale-aware face detection. In CVPR, 2017. 3

57 references, page 1 of 4
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue
publication . Preprint . Conference object . 2018

SurfConv: Bridging 3D and 2D Convolution for RGBD Images

Hang Chu; Wei-Chiu Ma; Kaustav Kundu; Raquel Urtasun; Sanja Fidler;