Earthbender: An Interactive System for Stylistic Heightmap Generation using a Guided Diffusion Model

MIG '25

Danial Barazandeh & Gabriel Zachmann

University of Bremen

Earthbender Workflow Teaser

Abstract

Games, 3D simulations, and cinematic pipelines depend on realistic 3D terrain for immersion. However, creating detailed 3D terrain is labour-intensive: artists sculpt elevation, iterate on mountains, rivers, lakes, and must often repeat the entire workflow when the design changes. Recent generative approaches are attempting to address this challenge, but they primarily focus on a single landform (typically mountains) and overlook structural features, such as river networks, roads, or lakes. We propose a sketch-conditioned diffusion framework that generates depth maps representing complete landscapes, including mountains, river networks, and lakes. Our method extends Stable Diffusion with a ControlNet branch that takes multiple channel inputs: Canny edges for overall structure, red for mountains, green for lakes, and blue as a carving tool for painting roads and rivers onto the heightmap.

Method & Interface

Our system uses a multi-channel semantic sketch to guide a pre-trained Stable Diffusion model via a custom ControlNet. The artist interacts with a complete GUI that includes drawing tools and real-time post-processing controls for fine-tuning the output. This creates an artist-centric workflow where the generative AI acts as a powerful assistant.

Earthbender System Interface

Qualitative Results

Comparison of our ControlNet-based method against the Pix2PixHD baseline. Our approach generates significantly more detailed and structurally coherent heightmaps that faithfully follow the input sketch.

Qualitative results comparing Earthbender (ControlNet) with Pix2PixHD

Final 3D Render

A final, high-quality 3D terrain rendered from a generated heightmap, demonstrating the applicability of our system in a standard 3D workflow.

Final 3D render of a terrain generated with Earthbender

Acknowledgments

This work was partially supported by a stipend from the University of Bremen.

Citation

If you find our work useful, please consider citing our paper:

@inproceedings{10.1145/3769047.3769053,
author = {Barazandeh, Danial and Zachmann, Gabriel},
title = {Earthbender: An Interactive System for Stylistic Heightmap Generation using a Guided Diffusion Model},
year = {2025},
isbn = {9798400722363},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3769047.3769053},
doi = {10.1145/3769047.3769053},
abstract = {Games, 3D simulations, and cinematic pipelines depend on realistic 3D terrain for immersion. However, creating detailed 3D terrain is labour-intensive: artists sculpt elevation, iterate on mountains, rivers, lakes, and must often repeat the entire workflow when the design changes. Recent generative approaches are attempting to address this challenge, but they primarily focus on a single landform (typically mountains) and overlook structural features, such as river networks, roads, or lakes.We propose a sketch‑conditioned diffusion framework that generates depth maps representing complete landscapes, including mountains, river networks, and lakes. Our method extends Stable Diffusion with a ControlNet branch that takes multiple channel inputs: Canny edges for overall structure, red for mountains, green for lakes, and blue as a carving tool for painting roads and rivers onto the heightmap.This approach addresses the technical challenges while prioritizing the artist’s creative control. Our interactive system, Earthbender, gives the artist fine-grained control over every detail in the heightmap, demonstrating a collaborative model where the generative AI acts as a powerful assistant to achieve an artistic vision, rather than replacing the artist’s creativity.Our experiments show that our ControlNet-based approach significantly outperforms traditional GANs in both data efficiency and output quality. Furthermore, we present an analysis demonstrating that the choice of loss function acts as a powerful artistic control, allowing the user to select between a sharp, detailed style and a softer, more organic output better suited for downstream game engine workflows.},
booktitle = {Proceedings of the 2025 18th ACM SIGGRAPH Conference on Motion, Interaction, and Games},
articleno = {6},
numpages = {11},
keywords = {Interactive Systems, Sketch-Based Modeling, Terrain Generation, ControlNet, Generative Models, Diffusion Models},
location = {
},
series = {MIG '25}
}