How Alibaba’s PAI Prompt Beautifier Supercharges Stable Diffusion Image Generation

This article introduces Alibaba Cloud's PAI Prompt Beautifier, a model that automatically refines simple text prompts into detailed descriptions for Stable Diffusion, detailing its BLOOM‑based architecture, data‑free SFT training, RLHF optimization, usage code, and future development plans.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Alibaba’s PAI Prompt Beautifier Supercharges Stable Diffusion Image Generation

Background

Stable Diffusion (SD) is a popular AI‑generated content (AIGC) model that creates diverse images from text prompts, but its output quality heavily depends on the quality of the prompt. To lower the barrier for users, Alibaba Cloud’s PAI team developed an automatic Prompt Beautifier that expands a simple prompt into a detailed, high‑quality prompt, enabling easier generation of aesthetically pleasing images.

One‑Click Prompt Generation Demo

The article shows side‑by‑side comparisons of original prompts and the beautified prompts generated by the model, illustrating the visual improvement on Stable Diffusion v1.5. Several example images are displayed to demonstrate the effect.

Technology Behind the Prompt Beautifier

The system is built on a BLOOM‑based language model. BLOOM, an open‑source multilingual decoder‑only model from BigScience, has up to 176 billion parameters; the PAI Prompt model fine‑tunes an 11 billion‑parameter version for fast inference and cost‑effective deployment.

No‑Annotation Supervised Fine‑Tuning (SFT)

Because high‑quality and low‑quality prompt pairs are hard to label, the team automatically constructs training data through three steps:

Summary Generation: High‑quality prompts are collected as targets; low‑quality prompts are synthesized by large models (e.g., ChatGPT) that generate concise summaries.

Prompt Expansion: The low‑quality prompts are fed to ChatGPT to produce richer, detailed prompts.

Image Captioning: High‑quality image‑text pairs are used to generate additional prompts via captioning.

After filtering for aesthetic quality and consistency, the curated data is used for SFT.

Reinforcement Learning for SD

RLHF (Reinforcement Learning from Human Feedback) is applied to further improve prompt generation. A reward model scores images using an aesthetic evaluator and a language model predicts the aesthetic score from the prompt. PPO optimization combines the aesthetic score and a consistency score:

reward = a * score_model(prompt) + b * consistency_model(raw_prompt, prompt)

This training yields a 1.1 billion‑parameter model whose performance rivals larger models like ChatGPT in prompt generation.

Model Access

The model is available on ModelScope and Hugging Face. Users can call it via the following Python code:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained('alibaba-pai/pai-bloom-1b1-text2prompt-sd')
model = AutoModelForCausalLM.from_pretrained('alibaba-pai/pai-bloom-1b1-text2prompt-sd').eval().cuda()

raw_prompt = '1 girl'
input = f'Instruction: Give a simple description of the image to generate a drawing prompt.
Input: {raw_prompt}
Output:'
input_ids = tokenizer.encode(input, return_tensors='pt').cuda()

outputs = model.generate(
    input_ids,
    max_length=384,
    do_sample=True,
    temperature=1.0,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.2,
    num_return_sequences=5)

prompts = tokenizer.batch_decode(outputs[:, input_ids.size(1):], skip_special_tokens=True)
prompts = [p.strip() for p in prompts]
print(prompts)

Future Outlook

The team plans to extend the Prompt Beautifier to support various SD model families, enriching Alibaba Cloud’s AIGC algorithm and product capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Prompt engineering
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.