How Alibaba’s PAI Prompt Beautifier Supercharges Stable Diffusion Image Generation
This article introduces Alibaba Cloud's PAI Prompt Beautifier, a model that automatically refines simple text prompts into detailed descriptions for Stable Diffusion, detailing its BLOOM‑based architecture, data‑free SFT training, RLHF optimization, usage code, and future development plans.
Background
Stable Diffusion (SD) is a popular AI‑generated content (AIGC) model that creates diverse images from text prompts, but its output quality heavily depends on the quality of the prompt. To lower the barrier for users, Alibaba Cloud’s PAI team developed an automatic Prompt Beautifier that expands a simple prompt into a detailed, high‑quality prompt, enabling easier generation of aesthetically pleasing images.
One‑Click Prompt Generation Demo
The article shows side‑by‑side comparisons of original prompts and the beautified prompts generated by the model, illustrating the visual improvement on Stable Diffusion v1.5. Several example images are displayed to demonstrate the effect.
Technology Behind the Prompt Beautifier
The system is built on a BLOOM‑based language model. BLOOM, an open‑source multilingual decoder‑only model from BigScience, has up to 176 billion parameters; the PAI Prompt model fine‑tunes an 11 billion‑parameter version for fast inference and cost‑effective deployment.
No‑Annotation Supervised Fine‑Tuning (SFT)
Because high‑quality and low‑quality prompt pairs are hard to label, the team automatically constructs training data through three steps:
Summary Generation: High‑quality prompts are collected as targets; low‑quality prompts are synthesized by large models (e.g., ChatGPT) that generate concise summaries.
Prompt Expansion: The low‑quality prompts are fed to ChatGPT to produce richer, detailed prompts.
Image Captioning: High‑quality image‑text pairs are used to generate additional prompts via captioning.
After filtering for aesthetic quality and consistency, the curated data is used for SFT.
Reinforcement Learning for SD
RLHF (Reinforcement Learning from Human Feedback) is applied to further improve prompt generation. A reward model scores images using an aesthetic evaluator and a language model predicts the aesthetic score from the prompt. PPO optimization combines the aesthetic score and a consistency score:
reward = a * score_model(prompt) + b * consistency_model(raw_prompt, prompt)This training yields a 1.1 billion‑parameter model whose performance rivals larger models like ChatGPT in prompt generation.
Model Access
The model is available on ModelScope and Hugging Face. Users can call it via the following Python code:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('alibaba-pai/pai-bloom-1b1-text2prompt-sd')
model = AutoModelForCausalLM.from_pretrained('alibaba-pai/pai-bloom-1b1-text2prompt-sd').eval().cuda()
raw_prompt = '1 girl'
input = f'Instruction: Give a simple description of the image to generate a drawing prompt.
Input: {raw_prompt}
Output:'
input_ids = tokenizer.encode(input, return_tensors='pt').cuda()
outputs = model.generate(
input_ids,
max_length=384,
do_sample=True,
temperature=1.0,
top_k=50,
top_p=0.95,
repetition_penalty=1.2,
num_return_sequences=5)
prompts = tokenizer.batch_decode(outputs[:, input_ids.size(1):], skip_special_tokens=True)
prompts = [p.strip() for p in prompts]
print(prompts)Future Outlook
The team plans to extend the Prompt Beautifier to support various SD model families, enriching Alibaba Cloud’s AIGC algorithm and product capabilities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
