SceneRF

Self-Supervised Monocular 3D Scene Reconstruction
with Radiance Fields

Novel depths synthesis from a single image
3D reconstruction from a single image

Abstract

In the literature, 3D reconstruction from 2D image has been extensively addressed but often still requires geometrical supervision. In this paper, we propose a self-supervised monocular scene reconstruction method with neural radiance fields (NeRF) learned from multiple image sequences with pose. To improve geometry prediction, we introduce new geometry constraints and a novel probabilistic sampling strategy that efficiently update radiance fields. As the latter are conditioned on a single frame, scene reconstruction is achieved from the fusion of multiple synthetized novel depth views. This is enabled by our spherical-decoder which allows hallucination beyond the input frame field of view. Thorough experiments demonstrate that we outperform all baselines on all metrics for novel depth views synthesis and scene reconstruction.

Overview of our method

Overview of our method
Our method leverages generalizable neural radiance field (NeRF) to generate novel depth views, conditioned on a single input frame. During training for each ray in addition to color, we explicitly optimize depth with a reprojection loss (Sec. 3.1), introduce a Probabilistic Ray Sampling strategy (PrSamp, Sec. 3.2) to sample points more efficiently. To hallucinate features outside the input FOV, we propose a spherical U-Net (Sec. 3.3). Finally, the synthesized depths are used for scene reconstruction (Sec. 3.4).

Qualitative results

Input
Trajectory Novel depths (and views)
3D Reconstruction

Citation

If you find this project useful for your research, please cite
@InProceedings{cao2022scenerf,
    author    = {Cao, Anh-Quan and de Charette, Raoul},
    title     = {SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields},
    publisher = {arxiv},
    year      = {2022},
}

Acknowledgements

The work was partly funded by the French project SIGHT (ANR-20-CE23-0016) and conducted in the SAMBA collaborative project, co-funded by BpiFrance in the Investissement d’Avenir Program. It was performed using HPC resources from GENCI–IDRIS (Grant 2021-AD011012808 and 2022-AD011012808R1). We thank Fabio Pizzati and Ivan Lopes for their kind proofreading.