RealMat: Realistic Materials with Diffusion and Reinforcement Learning

¹Texas A&M University, ²Max Planck Institute for Informatics, ³Adobe Research

Abstract

Generative models for high-quality materials are particularly desirable to make 3D content authoring more accessible. However, the majority of material generation methods are trained on synthetic data. Synthetic data provides precise supervision for material maps, which is convenient but also tends to create a significant visual gap with real-world materials. Alternatively, recent work used a small dataset of real flash photographs to guarantee realism, however such data is limited in scale and diversity. To address these limitations, we propose RealMat, a diffusion-based material generator that leverages realistic priors, including a text-to-image model and a dataset of realistic material photos under natural lighting. In RealMat, we first finetune a pretrained Stable Diffusion XL (SDXL) with synthetic material maps arranged in 2 × 2 grids. This way, our model inherits some realism of SDXL while learning the data distribution of the synthetic material grids. Still, this creates a realism gap, with some generated materials appearing synthetic. We propose to further finetune our model through reinforcement learning (RL), encouraging the generation of realistic materials. We develop a realism reward function for any material image under natural lighting, by collecting a large-scale dataset of realistic material images. We show that this approach increases generated materials' realism compared to our base model and related work.

Idea

We first finetune a Stable Diffusion XL (SDXL) model pretrained on images to generate detailed material maps using synthetic training data. Although high-quality materials are generated in many cases, this finetuning shifts the distribution of SDXL toward a more synthetic appearance.

To address this, we further finetune our model using reinforcement learning (RL) with a realism reward function. We show samples and maps (in clockwise order from top left: albedo, height, metallicity, and roughness) with fixed text prompts and seeds. With RL, the generated material distribution shifts towards more realistic overall.

Overview

(a) In the first stage, we finetune SDXL for text-to-material generation using synthetic SVBRDF maps arranged in 2 × 2 grids; (b) next, we train a realism reward (score) using a mixture of real photographs and synthetic data; (c) in a second finetuning stage, we use the reward in a reinforcement learning (RL) strategy to further push the generated distribution towards more realistic materials.

SDXL Finetuning

The sampled material maps and renderings of RealMat at the first stage of fine-tuning with the synthetic dataset. Here, we partition examples as synthetic (top row) and realistic (bottom row) to motivate our reinforcement learning realism fine-tuning stage. In all cases, the materials are high-quality, show consistent features across different material maps, and preserve good text alignment.

Realism Reward Function

The estimated normalized realism scores from our realism reward function. Left shows the scores of real materials and right shows the scores of rendered synthetic materials, demonstrating the effectiveness of the realism reward function.

RL Finetuning

RL fine-tuning progressively improves synthetic materials, while remaining consistent for realistic materials (bottom right). This demonstrates the effectiveness and robustness of the second stage of RealMat.

BibTeX

@article{zhou2025realmat, title={RealMat: Realistic Materials with Diffusion and Reinforcement Learning}, author={Zhou, Xilong and Figueiredo, Pedro and Ha{\v{s}}an, Milo{\v{s}} and Deschaintre, Valentin and Guerrero, Paul and Hu, Yiwei and Kalantari, Nima Khademi}, journal={arXiv preprint arXiv:2509.01134}, year={2025} }