
Humans perceive and construct the world as an arrangement of simple parametric models. In particular, we can often describe man-made environments using volumetric primitives such as cuboids or cylinders. Inferring these primitives is important for attaining high-level, abstract scene descriptions. Previous approaches for primitive-based abstraction estimate shape parameters directly and are only able to reproduce simple objects. In contrast, we propose a robust estimator for primitive fitting, which meaningfully abstracts complex real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to a depth map. We condition the network on previously detected parts of the scene, parsing it one-by-one. To obtain cuboids from single RGB images, we additionally optimise a depth estimation CNN end-to-end. Naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene. We thus propose an improved occlusion-aware distance metric correctly handling opaque scenes. Furthermore, we present a neural network based cuboid solver which provides more parsimonious scene abstractions while also reducing inference time. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts.
Accepted for publication in Transactions on Pattern Analysis and Machine Intelligence (PAMI). arXiv admin note: substantial text overlap with arXiv:2105.02047
/dk/atira/pure/subjectarea/asjc/1700/1702; name=Artificial Intelligence, FOS: Computer and information sciences, /dk/atira/pure/subjectarea/asjc/1700/1703; name=Computational Theory and Mathematics, Computer Vision and Pattern Recognition (cs.CV), cuboid fitting, Computer Science - Computer Vision and Pattern Recognition, Shape, shape decomposition, /dk/atira/pure/subjectarea/asjc/2600/2604; name=Applied Mathematics, /dk/atira/pure/subjectarea/asjc/1700/1707; name=Computer Vision and Pattern Recognition, multi-model fitting, Image reconstruction, Solid modeling, Three-dimensional displays, Training, Scene abstraction, Surface reconstruction, Estimation, minimal solver, /dk/atira/pure/subjectarea/asjc/1700/1712; name=Software
/dk/atira/pure/subjectarea/asjc/1700/1702; name=Artificial Intelligence, FOS: Computer and information sciences, /dk/atira/pure/subjectarea/asjc/1700/1703; name=Computational Theory and Mathematics, Computer Vision and Pattern Recognition (cs.CV), cuboid fitting, Computer Science - Computer Vision and Pattern Recognition, Shape, shape decomposition, /dk/atira/pure/subjectarea/asjc/2600/2604; name=Applied Mathematics, /dk/atira/pure/subjectarea/asjc/1700/1707; name=Computer Vision and Pattern Recognition, multi-model fitting, Image reconstruction, Solid modeling, Three-dimensional displays, Training, Scene abstraction, Surface reconstruction, Estimation, minimal solver, /dk/atira/pure/subjectarea/asjc/1700/1712; name=Software
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
