Artificial Intelligence 14 min read

Unlock High‑Quality Chinese Image Generation with PAI‑Diffusion: New Features & Fine‑Tuning Guide

This article introduces the upgraded PAI‑Diffusion Chinese models, highlighting major improvements in image quality and style diversity, detailing lightweight fine‑tuning methods such as LoRA and Textual Inversion, showcasing controllable editing, scenario‑specific customization, and providing step‑by‑step usage instructions on popular platforms.

Alibaba Cloud Big Data AI Platform

May 26, 2023

Unlock High‑Quality Chinese Image Generation with PAI‑Diffusion: New Features & Fine‑Tuning Guide

Authors: Duan Zhongjie, Liu Bingyan, Wang Chengyu, Huang Jun

Background

AI‑generated content (AIGC) models, exemplified by Stable Diffusion, have exploded in popularity. Alibaba Cloud's PAI‑Diffusion series, built on a Chinese CLIP cross‑modal alignment model, can generate high‑resolution images from Chinese text. Recent optimizations with PAI‑Blade enable sub‑second generation on A10 hardware.

Key New Features

Significant quality boost and richer style diversity through extensive data cleaning and training optimizations.

Support for lightweight fine‑tuning techniques (LoRA, Textual Inversion, DreamBooth, ControlNet) enabling domain‑specific adaptation with minimal compute.

Scenario‑oriented customization pipelines (Diffuser API, WebUI) for easy integration into products.

Art Gallery (Sample Outputs)

Below are examples generated by the upgraded model:

Model Zoo

Model Name

Use Case

pai-diffusion-artist-large-zh

Chinese text‑to‑image artistic model, default resolution 512×512

pai-diffusion-artist-xlarge-zh

Higher‑resolution artistic model, default resolution 768×768

Training data were filtered from large Chinese‑English multimodal corpora (WuKong, LAION‑5B) using NSFW, watermark removal, CLIP scoring, and aesthetic scoring. The Chinese CLIP encoder (EasyNLP) is frozen during diffusion training for precise semantic alignment.

Lightweight Fine‑Tuning with LoRA

LoRA reduces compute for full‑parameter adaptation. Example command:

export MODEL_NAME="model_name"
export TRAIN_DIR="path_to_your_dataset"
export OUTPUT_DIR="path_to_save_model"
accelerate launch train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=$TRAIN_DIR \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=15000 \
  --learning_rate=1e-04 \
  --max_grad_norm=1 \
  --lr_scheduler="cosine" --lr_warmup_steps=0 \
  --output_dir=$OUTPUT_DIR

Inference after LoRA fine‑tuning:

from diffusers import StableDiffusionPipeline
model_id = "model_name"
lora_path = "model_path/checkpoint-xxx/pytorch_model.bin"
pipe = StableDiffusionPipeline.from_pretrained(model_id)
pipe.unet.load_attn_procs(torch.load(lora_path))
pipe.to("cuda")
image = pipe("input text").images[0]
image.save("result.png")

Fine‑Tuning with Textual Inversion

Textual Inversion adds new concepts. Example command:

export MODEL_NAME="model_name"
export TRAIN_DIR="path_to_your_dataset"
export OUTPUT_DIR="path_to_save_model"
accelerate launch textual_inversion.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --train_data_dir=$TRAIN_DIR \
  --learnable_property="object" \
  --placeholder_token="<小奶猫>" --initializer_token="猫" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=100 \
  --learning_rate=5.0e-04 --scale_lr \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --output_dir=$OUTPUT_DIR

Generate images with the new token:

from diffusers import StableDiffusionPipeline
model_path = "path_to_save_model"
pipe = StableDiffusionPipeline.from_pretrained(model_path).to("cuda")
image = pipe("一只<小奶猫>在草地上").images[0]
image.save("result.png")

Controllable Image Editing

The model fully supports the StableDiffusionImg2ImgPipeline for text‑guided editing:

from diffusers import StableDiffusionImg2ImgPipeline
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("model_name").to("cuda")
image = pipe(prompt="input text", image=init_image, strength=0.75, guidance_scale=7.5).images[0]
image.save("result.png")

Scenario‑Specific Pre‑Training

Continuing pre‑training on domain data (e.g., food) yields specialized models that retain the ability to combine with LoRA or ControlNet for fine‑grained control.

Model Access & Deployment

Both models are published on HuggingFace and ModelScope. Example usage on HuggingFace:

from diffusers import StableDiffusionPipeline
model_id = "alibaba-pai/pai-diffusion-artist-large-zh"
pipe = StableDiffusionPipeline.from_pretrained(model_id).to("cuda")
image = pipe("雾蒙蒙的日出在湖面上").images[0]
image.save("result.png")

Example usage on ModelScope:

from modelscope.pipelines import pipeline
p = pipeline('text-to-image-synthesis', 'PAI/pai-diffusion-artist-large-zh', model_revision='v1.0.0')
result = p({'text': '雾蒙蒙的日出在湖面上'})
cv2.imwrite("image.png", result["output_imgs"][0])

Future Outlook

The team plans to further expand scenario‑specific capabilities, improve generation quality, and integrate more lightweight fine‑tuning techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LoRA model fine-tuning diffusion Textual Inversion

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.