Material Palette: Extraction of Materials from a Single Image

1 Inria 2 University of Oxford

Accepted to CVPR 2024

Meet us Wednesday, 19 June at CVPR in Seattle during the first poster session
Material Palette extracts a palette of PBR materials (albedo \(A\), normals \(N\), and roughness \(R\)) from a single real-world image.

Video

Abstract

In this paper, we propose a method to extract Physically-Based-Rendering (PBR) materials from a single real-world image. We do so in two steps: first, we map regions of the image to material concepts using a diffusion model, which allows the sampling of texture images resembling each material in the scene. Second, we benefit from a separate network to decompose the generated textures into Spatially Varying BRDFs (SVBRDFs), providing us with materials ready to be used in rendering applications. Our approach builds on existing synthetic material libraries with SVBRDF ground truth, but also exploits a diffusion-generated RGB texture dataset to allow generalization to new samples using unsupervised domain adaptation (UDA). Our contributions are thoroughly evaluated on synthetic and real-world datasets. We further demonstrate the applicability of our method for editing 3D scenes with materials estimated from real photographs.

Pipeline

Method pipeline. From a single image \(\mathcal{I}\) (left) our method extracts the SVBRDF of dominant materials~(right). Considering a set of regions \(\{\mathcal{R}_1,\cdots,\mathcal{R}_N\}\) from a user or a segmenter , we process each region \(\mathcal{R}_i\) separately following two steps. In , we finetune Stable Diffusion on crops of the region \(P_{\text{C}}\) to learn a concept \(S^*\), which is later used to generate a texture image \(P_\text{SD}\) resembling \(P_{\text{C}}\). Then in , these patches are decomposed into SVBRDF intrinsic maps (albedo \(A\), normals \(N\), and roughness \(R\)) using a multi-task network. Finally, the output of the method is a palette of extracted materials \(\{M_1,\cdots,M_N\}\) corresponding to input regions \(\{\mathcal{R}_1,\cdots,\mathcal{R}_N\}\).

Texture extraction

Click on our extracted textures to enlarge them!
Texture extraction. We compare texture extracted from natural images with 4 baselines, either patch-based or region-based. Different from baselines, our method is based on a learned concept \(S^*\) which, when used for generating samples, corrects artifacts, is not limited to a fixed resolution, and is fully tileable, resulting in homogeneous textures. We show outputs of our method at a resolution of \(\mathsf{2048x2048}\), click on our textures to enlarge them!

Decomposition

SVBRDF Unsupervised Domain Adaptation. We train a decomposition network \(f\) on labeled SVBRDF materials \(\mathcal{S}\) and unlabeled target data \(\mathcal{T}\) from our novel \(\mathsf{TexSD}\) dataset. Ultimately, \(\mathcal{T}\) acts as a domain bridge between the SVBRDF dataset and the real domain, ie. patches generated from our extraction method.

Extracted palettes

We show palette extraction results using three methods as region proposals: (a) user defined regions, (b) Materialistic, and (c) the Segment Anything Model (SAM).

Here are additional sphere renderings of materials extracted from in-the-wild images.

End to end evaluation

To evaluate our method end-to-end, we first generate images by rendering 3D scenes with edited materials. After applying Material Palette to extract the materials (lower row), we can assess the performance by comparing it with the ground-truth decompositions (upper row). For all examples below, we show the albedo, normals, and roughness maps (left to right).
Click on the decomposition maps to enlarge them!

Our texture dataset: \(\mathsf{TexSD}\)

We generate a high-resolution texture dataset by prompting a pre-trained Stable Diffusion model. You can find the 10,000 generated samples on Hugging Face Datasets.
More details can be found in our preprint.

Acknowledgements

This research project was mainly funded by the French Agence Nationale de la Recherche (ANR) as part of project SIGHT (ANR-20-CE23-0016). Results were obtained using HPC resources from GENCI-IDRIS (Grant 2023-AD011014389). Fabio Pizzati was partially funded by KAUST (Grant DFR07910).

The repository contains code taken from PEFT, SVBRDF-Estimation, DenseMTL. As for visualization, we used DeepBump and Blender. Credit to Runway for providing the Stable Diffusion v1.5 model weights. All images and 3D scenes used in this work have permissive licenses. Special credits to AmbientCG for the huge work.

We would also like to thank all members of Astra-Vision for their valuable feedback.

Consider citing us!

@inproceedings{lopes2023material,
    author = {Lopes, Ivan and Pizzati, Fabio and de Charette, Raoul},
    title = {Material Palette: Extraction of Materials from a Single Image},
    booktitle = {CVPR},
    year = {2024},
    project = {https://astra-vision.github.io/MaterialPalette/}
}