Unlock High‑Quality Chinese Image Generation with PAI‑Diffusion: New Features & Fine‑Tuning Guide
This article introduces the upgraded PAI‑Diffusion Chinese models, highlighting major improvements in image quality and style diversity, detailing lightweight fine‑tuning methods such as LoRA and Textual Inversion, showcasing controllable editing, scenario‑specific customization, and providing step‑by‑step usage instructions on popular platforms.
Authors: Duan Zhongjie, Liu Bingyan, Wang Chengyu, Huang Jun
Background
AI‑generated content (AIGC) models, exemplified by Stable Diffusion, have exploded in popularity. Alibaba Cloud's PAI‑Diffusion series, built on a Chinese CLIP cross‑modal alignment model, can generate high‑resolution images from Chinese text. Recent optimizations with PAI‑Blade enable sub‑second generation on A10 hardware.
Key New Features
Significant quality boost and richer style diversity through extensive data cleaning and training optimizations.
Support for lightweight fine‑tuning techniques (LoRA, Textual Inversion, DreamBooth, ControlNet) enabling domain‑specific adaptation with minimal compute.
Scenario‑oriented customization pipelines (Diffuser API, WebUI) for easy integration into products.
Art Gallery (Sample Outputs)
Below are examples generated by the upgraded model:
Model Zoo
Model Name
Use Case
pai-diffusion-artist-large-zh
Chinese text‑to‑image artistic model, default resolution 512×512
pai-diffusion-artist-xlarge-zh
Higher‑resolution artistic model, default resolution 768×768
Training data were filtered from large Chinese‑English multimodal corpora (WuKong, LAION‑5B) using NSFW, watermark removal, CLIP scoring, and aesthetic scoring. The Chinese CLIP encoder (EasyNLP) is frozen during diffusion training for precise semantic alignment.
Lightweight Fine‑Tuning with LoRA
LoRA reduces compute for full‑parameter adaptation. Example command:
export MODEL_NAME="model_name"
export TRAIN_DIR="path_to_your_dataset"
export OUTPUT_DIR="path_to_save_model"
accelerate launch train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$TRAIN_DIR \
--resolution=512 --center_crop --random_flip \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=15000 \
--learning_rate=1e-04 \
--max_grad_norm=1 \
--lr_scheduler="cosine" --lr_warmup_steps=0 \
--output_dir=$OUTPUT_DIRInference after LoRA fine‑tuning:
from diffusers import StableDiffusionPipeline
model_id = "model_name"
lora_path = "model_path/checkpoint-xxx/pytorch_model.bin"
pipe = StableDiffusionPipeline.from_pretrained(model_id)
pipe.unet.load_attn_procs(torch.load(lora_path))
pipe.to("cuda")
image = pipe("input text").images[0]
image.save("result.png")Fine‑Tuning with Textual Inversion
Textual Inversion adds new concepts. Example command:
export MODEL_NAME="model_name"
export TRAIN_DIR="path_to_your_dataset"
export OUTPUT_DIR="path_to_save_model"
accelerate launch textual_inversion.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$TRAIN_DIR \
--learnable_property="object" \
--placeholder_token="<小奶猫>" --initializer_token="猫" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=100 \
--learning_rate=5.0e-04 --scale_lr \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--output_dir=$OUTPUT_DIRGenerate images with the new token:
from diffusers import StableDiffusionPipeline
model_path = "path_to_save_model"
pipe = StableDiffusionPipeline.from_pretrained(model_path).to("cuda")
image = pipe("一只<小奶猫>在草地上").images[0]
image.save("result.png")Controllable Image Editing
The model fully supports the StableDiffusionImg2ImgPipeline for text‑guided editing:
from diffusers import StableDiffusionImg2ImgPipeline
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("model_name").to("cuda")
image = pipe(prompt="input text", image=init_image, strength=0.75, guidance_scale=7.5).images[0]
image.save("result.png")Scenario‑Specific Pre‑Training
Continuing pre‑training on domain data (e.g., food) yields specialized models that retain the ability to combine with LoRA or ControlNet for fine‑grained control.
Model Access & Deployment
Both models are published on HuggingFace and ModelScope. Example usage on HuggingFace:
from diffusers import StableDiffusionPipeline
model_id = "alibaba-pai/pai-diffusion-artist-large-zh"
pipe = StableDiffusionPipeline.from_pretrained(model_id).to("cuda")
image = pipe("雾蒙蒙的日出在湖面上").images[0]
image.save("result.png")Example usage on ModelScope:
from modelscope.pipelines import pipeline
p = pipeline('text-to-image-synthesis', 'PAI/pai-diffusion-artist-large-zh', model_revision='v1.0.0')
result = p({'text': '雾蒙蒙的日出在湖面上'})
cv2.imwrite("image.png", result["output_imgs"][0])Future Outlook
The team plans to further expand scenario‑specific capabilities, improve generation quality, and integrate more lightweight fine‑tuning techniques.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
