A Confidence-based Iterative Solver of Depths and
Surface Normals for Deep Multi-view Stereo

1Tsinghua Univ.

2ETH Zurich

3ByteDance Inc.


Overview Video



In this paper, we introduce a deep multi-view stereo (MVS) system that jointly predicts depths, surface normals and per-view confidence maps. The key to our approach is a novel solver that iteratively solves for per-view depth map and normal map by optimizing an energy potential based on the locally planar assumption. Specifically, the algorithm updates depth map by propagating from neighboring pixels with slanted planes, and updates normal map with local probabilistic plane fitting. Both two steps are monitored by a customized confidence map. This solver is not only effective as a post-processing tool for plane-based depth refinement and completion, but also differentiable such that it can be efficiently integrated into deep learning pipelines. Our multi-view stereo system employs multiple optimization steps of the solver over the initial prediction of depths and surface normals. The whole system can be trained end-to-end, decoupling the challenging problem of matching pixels within poorly textured regions from the cost-volume based neural network. Experimental results on ScanNet and RGB-D Scenes V2 demonstrate state-of-the-art performance of the proposed deep MVS system on multi-view depth estimation, with our proposed solver consistently improving the depth quality over both conventional and deep learning based MVS pipelines.

Multi-view Depth Estimation

This figure shows some qualitative results of depth estimation. Our method produces visually more appealing multi-view depth maps and surpass strong baselines on both ScanNet and RGB-D Scenes V2 dataset.

Surface Normal Estimation

We test our surface normal estimation on ScanNet dataset. Our method achieves competitive results both visually and quantitatively.


Comparison of 3D reconstruction results from direct TSDF fusion. Our method can reconstruct visual details and smoother geometry.


  author    = {Zhao, Wang and Liu, Shaohui and Wei, Yi and Guo, Hengkai and Liu, Yong-Jin},
  title     = {A Confidence-based Iterative Solver of Depths and Surface Normals for Deep Multi-view Stereo},
  booktitle = {ICCV},
  year = {2021}