
Monocular metric depth estimation (MMDE) aims to generate depth maps with an absolute metric scale from a single RGB image, which enables accurate spatial understanding, 3D reconstruction, and autonomous navigation. Unlike conventional monocular depth estimation that predicts only relative depth, MMDE maintains geometric consistency across frames and supports reliable integration with visual SLAM, high-precision 3D modeling, and novel view synthesis. This survey provides a comprehensive review of MMDE, tracing its evolution from geometry-based formulations to modern learning-based frameworks. The discussion emphasizes the importance of datasets, distinguishing metric datasets that supply absolute ground-truth depth from relative datasets that facilitate ordinal or normalized depth learning. Representative datasets, including KITTI, NYU-Depth, ApolloScape, and TartanAir, are analyzed with respect to scene composition, sensor modality, and intended application domain. Methodological progress is examined across several dimensions, including model architecture design, domain generalization, structural detail preservation, and the integration of synthetic data that complements real-world captures. Recent advances in patch-based inference, generative modeling, and loss design are compared to reveal their respective advantages and limitations. By summarizing the current landscape and outlining open research challenges, this work establishes a clear reference framework that supports future studies and facilitates the deployment of MMDE in real-world vision systems requiring precise and robust metric depth estimation.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
