Monocular depth estimation (MDE) provides a useful tool for robotic perception, but its predictions are often uncertain and inaccurate in challenging environments such as surgical scenes where textureless surfaces, specular reflections, and occlusions are common. To address this, we propose ProbeMDE, a cost-aware active sensing framework that combines RGB images with sparse proprioceptive measurements for MDE.
Our approach utilizes an ensemble of MDE models to predict dense depth maps conditioned on both RGB images and on a sparse set of known depth measurements obtained via proprioception, where the robot has touched the environment in a known configuration. We quantify predictive uncertainty via the ensemble’s variance and measure the gradient of the uncertainty with respect to candidate measurement locations. The places with the highest negative values in our gradient map indicate the pixel locations where additional probes will yield the greatest reduction in ensemble variance.
We validate our method in both simulated and physical experiments on central airway obstruction surgical phantoms. Our results demonstrate that our approach outperforms baseline methods across standard depth estimation metrics, achieving higher accuracy while minimizing the number of required proprioceptive measurements.
Recent advancements in deep learning have led to large-scale foundation models for monocular depth estimation that are highly performant on in-distribution cases. However, these models perform poorly for endoscopic images where textureless surfaces, specular reflections, and overall dissimilarity from training datasets make predictions difficult. Here we show Depth Anything V2’s predictions for both simulation and real endoscopic scenes.
| Method | AbsRel | SqRel | RMSE | Log RMSE | ScInv | δ < 1.25 | δ < 1.25^2 | δ < 1.25^3 |
|---|---|---|---|---|---|---|---|---|
| No Sparse Ground-Truth | 0.352 | 2.657 | 6.559 | 0.376 | 0.328 | 0.580 | 0.832 | 0.920 |
| Random Selection | 0.167 | 0.739 | 3.991 | 0.214 | 0.188 | 0.785 | 0.946 | 0.978 |
| Greedy Variance | 0.171 | 0.789 | 4.122 | 0.221 | 0.195 | 0.777 | 0.944 | 0.977 |
| Stein Variance | 0.170 | 0.789 | 4.182 | 0.221 | 0.195 | 0.776 | 0.942 | 0.976 |
| Greedy Gradient | 0.164 | 0.720 | 3.878 | 0.209 | 0.183 | 0.782 | 0.950 | 0.981 |
| ProbeMDE (Ours) | 0.162 | 0.708 | 3.920 | 0.206 | 0.180 | 0.791 | 0.952 | 0.981 |
Performance comparison of active proprioception strategies on simulated data. Lower is better for AbsRel, SqRel, RMSE, Log RMSE, and ScInv. Higher is better for δ<1.25^t. Best results in each column are bolded.
| Method | AbsRel | SqRel | RMSE | Log RMSE | ScInv | δ < 1.25 | δ < 1.25^2 | δ < 1.25^3 |
|---|---|---|---|---|---|---|---|---|
| Random Selection | 0.242 | 3.745 | 14.254 | 0.391 | 0.331 | 0.452 | 0.809 | 0.939 |
| Greedy Variance | 0.261 | 3.695 | 13.797 | 0.385 | 0.316 | 0.377 | 0.774 | 0.944 |
| Stein Variance | 0.260 | 3.812 | 13.588 | 0.383 | 0.316 | 0.406 | 0.785 | 0.943 |
| Greedy Gradient | 0.264 | 4.021 | 14.503 | 0.410 | 0.341 | 0.392 | 0.767 | 0.936 |
| ProbeMDE (Ours) | 0.250 | 3.682 | 13.582 | 0.380 | 0.315 | 0.432 | 0.791 | 0.940 |
Performance comparison of active proprioception strategies on physical data. Lower is better for AbsRel, SqRel, RMSE, Log RMSE, and ScInv. Higher is better for δ<1.25^t. Best results in each column are bolded.
@article{jordan2025probemde,
author = {Jordan, Britton and Thompson, Jordan and d'Almeida, Jesse F. and Li, Hao and Kumar, Nithesh and Stern, Susheela Sharma and Oguz, Ipek and Webster III, Robert J. and Brown, Daniel and Kuntz, Alan and Ferguson, James},
title = {ProbeMDE: Uncertainty-Guided Active Proprioception for Monocular Depth Estimation in Surgical Robotics},
journal = {arXiv:2512.11773},
year = {2025}
}