How GANs Turn Sketches into Realistic Landscapes: Inside the “TuYa” Algorithm

This article explains the GAN‑based “TuYa” sketch‑to‑landscape algorithm presented at the Yidian News Hackathon, detailing its semantic image synthesis approach, the encoder, generator with SPADE, and PatchGAN discriminator, and discusses potential applications for designers and architects.

Cyber Elephant Tech Team
Cyber Elephant Tech Team
Cyber Elephant Tech Team
How GANs Turn Sketches into Realistic Landscapes: Inside the “TuYa” Algorithm

1. Introduction

The second Yidian News Hackathon has concluded, and the team “Shenbi Malian” impressed the audience with their project “TuYa”. The system combines AI capabilities such as sketch‑to‑landscape, image eraser, and hand‑drawn face generation, allowing users without advanced drawing or Photoshop skills to create desired images through simple sketches.

Figure 1‑1 Sketch‑to‑landscape demo

Figure 1‑2 TuYa project overview

Figure 1‑3 Hand‑drawn face generation

2. Sketch‑to‑Landscape Algorithm

The algorithm is a GAN‑based image generator that can turn a few hand‑drawn contour lines into photorealistic scenes such as mountains, lakes, and blue skies. Given a semantic segmentation map, it synthesizes a corresponding realistic image, a task known as semantic image synthesis. Because semantic maps consist of simple lines, they are easy to create, which inspired the team’s “magic brush” nickname.

Figure 2 Effect showcase

2.1 Overall Network Structure

The network follows a conditional GAN architecture, consisting of an Encoder, a Generator, and a Discriminator that compete during training.

Figure 2‑1 Overall network diagram

The three modules are:

Encoder

Generator

Discriminator

2.2 Encoder Module

The Encoder extracts mean and variance from a real image using a stack of convolutional layers followed by two fully‑connected layers. These statistics define a distribution; sampling from it yields a latent vector that encodes the style of the input image.

Figure 2‑2‑1 Encoder network structure

After denormalizing the Gaussian‑sampled vector, the resulting random vector carries the real‑image information and serves as input to the Generator, enabling style‑controlled image synthesis.

Figure 2‑2‑2 Generated images with different styles

During the Hackathon the Encoder was simplified, so style selection was omitted; this can be added in future development.

2.3 Generator Module

The Generator learns a mapping from the input semantic mask to a photo‑level realistic image. It receives the random vector from the Encoder and incorporates multi‑scale semantic maps to provide contextual information, progressively refining the image from coarse to fine.

Figure 2‑3‑1 Generator structure

To avoid loss of semantic information caused by Batch Normalization, the Generator uses Spatially‑Adaptive Normalization (SPADE). SPADE takes the previous layer’s output and semantic maps at different scales, processes the semantic maps through a convolution, and then combines the results with the normalized features via element‑wise multiplication and addition, restoring semantic details.

Figure 2‑3‑2 SPADE module structure

The SPADE blocks are stacked with up‑sampling to form the full Generator architecture.

2.4 Discriminator

The Discriminator receives the concatenated semantic map and generated image, processing them through a series of layers to output a realism judgment. It follows the PatchGAN design used in pix2pixHD, producing an NxN map of real/fake scores for image patches rather than a single global score, which better captures high‑resolution details.

Figure 2‑4 Discriminator PatchGAN

Through the adversarial training of Generator and Discriminator, the system can generate realistic images from semantic sketches.

3. Future Outlook

The algorithm can empower architects, urban planners, landscape designers, game developers, advertising creators, and other image‑centric professions by providing a powerful tool for rapid virtual world creation. By leveraging AI to infer realistic appearances, designers can prototype high‑fidelity concepts directly during brainstorming.

Figure 3‑1 Future prospects illustration

Article sourced from the Yidian News Algorithm Team.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Computer VisionAIGANimage generationsemantic image synthesisSPADE
Cyber Elephant Tech Team
Written by

Cyber Elephant Tech Team

Official tech account of Cyber Elephant, a platform for the group's technology innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.