ProbeMDE: Uncertainty-Guided Active Proprioception for Monocular Depth Estimation in Surgical Robotics

Britton Jordan*1, Jordan Thompson*1, Jesse F. d’Almeida2, Hao Li2, Nithesh Kumar2, Susheela Sharma Stern2, Ipek Oguz2, Robert J. Webster III2, Daniel Brown1, Alan Kuntz1, James Ferguson1
1University of Utah, 2Vanderbilt University
*These authors contributed equally.

ProbeMDE uses cost-aware robot proprioception to refine a depth estimate of endoscopic imagery.

Abstract

Monocular depth estimation (MDE) provides a useful tool for robotic perception, but its predictions are often uncertain and inaccurate in challenging environments such as surgical scenes where textureless surfaces, specular reflections, and occlusions are common. To address this, we propose ProbeMDE, a cost-aware active sensing framework that combines RGB images with sparse proprioceptive measurements for MDE.

Our approach utilizes an ensemble of MDE models to predict dense depth maps conditioned on both RGB images and on a sparse set of known depth measurements obtained via proprioception, where the robot has touched the environment in a known configuration. We quantify predictive uncertainty via the ensemble’s variance and measure the gradient of the uncertainty with respect to candidate measurement locations. The places with the highest negative values in our gradient map indicate the pixel locations where additional probes will yield the greatest reduction in ensemble variance.

We validate our method in both simulated and physical experiments on central airway obstruction surgical phantoms. Our results demonstrate that our approach outperforms baseline methods across standard depth estimation metrics, achieving higher accuracy while minimizing the number of required proprioceptive measurements.

Motivation

Recent advancements in deep learning have led to large-scale foundation models for monocular depth estimation that are highly performant on in-distribution cases. However, these models perform poorly for endoscopic images where textureless surfaces, specular reflections, and overall dissimilarity from training datasets make predictions difficult. Here we show Depth Anything V2’s predictions for both simulation and real endoscopic scenes.

Motivation figure
Motivation figure
Foundation models for MDE perform poorly on endoscopic images.

Architecture

ProbeMDE architecture
ProbeMDE takes as input an endoscopic RGB image and a sparse propriocepted depth map. It produces a dense depth map and the best locations for future probes.

Recorded Demo

Probing sequence

Quantitative Results

Method AbsRel SqRel RMSE Log RMSE ScInv δ < 1.25 δ < 1.25^2 δ < 1.25^3
No Sparse Ground-Truth0.3522.6576.5590.3760.3280.5800.8320.920
Random Selection0.1670.7393.9910.2140.1880.7850.9460.978
Greedy Variance0.1710.7894.1220.2210.1950.7770.9440.977
Stein Variance0.1700.7894.1820.2210.1950.7760.9420.976
Greedy Gradient0.1640.7203.8780.2090.1830.7820.9500.981
ProbeMDE (Ours)0.1620.7083.9200.2060.1800.7910.9520.981

Performance comparison of active proprioception strategies on simulated data. Lower is better for AbsRel, SqRel, RMSE, Log RMSE, and ScInv. Higher is better for δ<1.25^t. Best results in each column are bolded.

Method AbsRel SqRel RMSE Log RMSE ScInv δ < 1.25 δ < 1.25^2 δ < 1.25^3
Random Selection0.2423.74514.2540.3910.3310.4520.8090.939
Greedy Variance0.2613.69513.7970.3850.3160.3770.7740.944
Stein Variance0.2603.81213.5880.3830.3160.4060.7850.943
Greedy Gradient0.2644.02114.5030.4100.3410.3920.7670.936
ProbeMDE (Ours)0.2503.68213.5820.3800.3150.4320.7910.940

Performance comparison of active proprioception strategies on physical data. Lower is better for AbsRel, SqRel, RMSE, Log RMSE, and ScInv. Higher is better for δ<1.25^t. Best results in each column are bolded.

BibTeX

@article{jordan2025probemde,
  author    = {Jordan, Britton and Thompson, Jordan and d'Almeida, Jesse F. and Li, Hao and Kumar, Nithesh and Stern, Susheela Sharma and Oguz, Ipek and Webster III, Robert J. and Brown, Daniel and Kuntz, Alan and Ferguson, James},
  title     = {ProbeMDE: Uncertainty-Guided Active Proprioception for Monocular Depth Estimation in Surgical Robotics},
  journal   = {arXiv:2512.11773},
  year      = {2025}
}