Industry Insights 9 min read

How Baidu’s ERNIE‑ViLG 2.0 and PaddlePaddle Boost AI Painting Performance

This article analyzes Baidu’s ERNIE‑ViLG 2.0 and PaddlePaddle‑optimized Stable Diffusion models, presenting benchmark comparisons, hardware‑specific speed and memory gains, and the underlying inference optimizations that enable low‑cost, high‑throughput AI‑generated image creation.

Baidu Geek Talk

Mar 9, 2023

How Baidu’s ERNIE‑ViLG 2.0 and PaddlePaddle Boost AI Painting Performance

Background

AIGC (AI‑Generated Content) has become a major research direction in deep learning, with AI‑driven painting being a prominent application. Diffusion‑based text‑to‑image models such as Stable Diffusion have generated strong demand for efficient deployment.

Model Performance Highlights

Baidu’s knowledge‑enhanced multimodal model ERNIE‑ViLG 2.0 surpasses Stable Diffusion and DALL‑E 2 on the MS‑COCO benchmark and human blind‑evaluation, showing superior semantic controllability, image clarity, and understanding of Chinese cultural concepts.

Benchmark Results

On a single NVIDIA A100 (80 GB) GPU, PaddlePaddle inference of Stable Diffusion achieves 68.2 iters/s (0.76 s per 512×512 image), which is 4 × faster than Diffusers (PyTorch) and 7.9 % faster than the best TensorRT configuration while using only 43 % of TensorRT’s memory.

On Baidu’s Kunlun R200 (32 GB) accelerator, ERNIE‑ViLG 2.0 inference is 20 % faster than comparable mainstream inference cards and consumes less memory, enabling higher‑resolution generation.

Performance comparison of different hardware for ERNIE‑ViLG

Key Inference Optimizations

Flash Attention

PaddlePaddle integrates a high‑performance Flash Attention kernel that splits the softmax computation and reduces memory accesses for self‑attention and cross‑attention, accelerating inference and lowering memory usage.

Norm Fusion

LayerNorm and GroupNorm operators in the U‑Net are fused with surrounding element‑wise and activation ops. PaddlePaddle merges 93 distinct norm patterns, yielding a 3 % inference speed improvement.

Mixed Layout Computation

Tensor layout matching eliminates redundant transposes in the U‑Net, removing 32 transpose operations and delivering a 3‑4 % speed boost while also cutting memory consumption.

Scheduler Optimization

The scheduler logic in the PPDiffusers library is streamlined: GPU kernel launches per scheduler.step drop from ~12 to 7, and pre‑computed parameters remove CPU work and GPU‑CPU synchronization during sampling loops.

Inference Memory Optimization

Operator fusion reduces the number of independent U‑Net operators by 60 %, cutting memory usage by 27 %. Layout‑aware optimizations further lower overall memory by ~19 %. For ERNIE‑ViLG 2.0, workspace reuse reduces memory consumption by 37 %.

Combined, these techniques allow Stable Diffusion to run on a single A100 (80 GB) with 0.76 s latency, 68.2 iters/s speed, and only 4.6 GB memory – a best‑in‑class result.

Deployment Tools and References

The open‑source PaddlePaddle diffusion toolbox (PPDiffusers) provides end‑to‑end training and inference pipelines. Repository: https://github.com/PaddlePaddle/PaddleNLP/tree/develop/ppdiffusers

FastDeploy offers ready‑to‑use deployment packages for Stable Diffusion on GPU and Kunlun R200. Repository: https://github.com/PaddlePaddle/FastDeploy/tree/develop/examples/multimodal/stable_diffusion

Future Work

PaddlePaddle will continue to optimize large‑scale generative models, expanding end‑to‑end training, compression, and inference pipelines to further reduce deployment costs and accelerate industry adoption of AIGC technologies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization GPU Acceleration large model inference Stable Diffusion AIGC AI painting PaddlePaddle

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.