Urban 3D Panoptic Scene Completion with
Uncertainty Awareness

CVPR 2024


We propose the task of Panoptic Scene Completion (PSC) which extends the recently popular Semantic Scene Completion (SSC) task with instance-level information to produce a richer understanding of the 3D scene. Our PSC proposal utilizes a hybrid mask-based technique on the non-empty voxels from sparse multi-scale completions. Whereas the SSC literature overlooks uncertainty which is critical for robotics applications, we instead propose an efficient ensembling to estimate both voxel-wise and instance-wise uncertainties along PSC. This is achieved by building on a multi-input multi-output (MIMO) strategy, while improving performance and yielding better uncertainty for little additional compute. Additionally, we introduce a technique to aggregate permutation-invariant mask predictions. Our experiments demonstrate that our method surpasses all baselines in both Panoptic Scene Completion and uncertainty estimation on three large-scale autonomous driving datasets.


Overview of our method

Overview of our method
Our method aims to predict multiple variations of Panoptic Scene Completion (PSC) given an incomplete 3D point cloud, while allowing uncertainty estimation through mask ensembling. For PSC we employ a sparse 3D generative U-Net with a transformer decoder. The uncertainty awareness is enabled using multiple subnets each operating on a different augmented version of an input data source. PaSCo allows the first Panoptic Scene Completion while providing a robust method for uncertainty estimation. Instance-wise uncertainty shows only "things'' classes for clarity.

Panoptic Scene Completion Comparison

Uncertainty Estimation Comparison

Robusness Evaluation on Robo3D

Overview of our method


If you find this project useful for your research, please cite
      title={PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness}, 
      author={Anh-Quan Cao and Angela Dai and Raoul de Charette},
      booktitle = {CVPR},


The research was supported by the French project SIGHT (ANR-20-CE23-0016), the ERC Starting Grant SpatialSem (101076253), and the SAMBA collaborative project co-funded by BpiFrance in the Investissement d’Avenir Program. Computation was performed using HPC resources from GENCI–IDRIS (2023-AD011014102, AD011012808R2). We thank all Astra-Vision members for their valuable feedbacks, including Andrei Bursuc for his excellent suggestions and Tetiana Martyniuk for her kind proofreading.