Artificial Intelligence 8 min read

How BeautifulPrompt Automates Prompt Engineering for Text-to-Image Generation

BeautifulPrompt, presented at EMNLP 2023, introduces a deep generation model that automatically crafts high-quality prompts from simple image descriptions, enhancing text-to-image synthesis through data-driven fine‑tuning, reward modeling, and reinforcement learning techniques.

Alibaba Cloud Big Data AI Platform

Dec 8, 2023

How BeautifulPrompt Automates Prompt Engineering for Text-to-Image Generation

Background

Text‑to‑image (T2I) synthesis is one of the most prominent AIGC technologies, with large models such as DALL‑E 2, Imagen, and Stable Diffusion dramatically improving image quality. However, these models require detailed textual prompts, and non‑experts often struggle to write effective prompts, leading to wasted time and compute resources.

BeautifulPrompt Overview

BeautifulPrompt is a deep generation model developed by Alibaba Cloud AI Platform PAI and Prof. Zhu Jin‑hui’s team at South China University of Technology. It automatically generates high‑quality prompts from simple image descriptions, enabling T2I models to produce more aesthetically pleasing images.

Data Collection

We constructed a prompt‑pair dataset using DiffusionDB as the raw source. Low‑quality prompts were derived by (i) captioning high‑quality images with BLIP, (ii) summarizing high‑quality prompts with ChatGPT, and (iii) generating improved prompts from low‑quality ones with ChatGPT. The resulting pairs were filtered to remove inappropriate content and to retain images with high aesthetic scores.

Training Procedure

Step 1. SFT

Given the prompt‑pair dataset, we fine‑tuned a decoder‑only language model (BLOOM) to output high‑quality tokens conditioned on low‑quality prompts and instructions.

Step 2. Reward Model (RM)

We trained a reward model using PickScore (a preference model trained on text‑to‑image prompts and human preferences) and an Aesthetic Score. The loss is the mean‑squared error between the model’s scalar output and the ground‑truth reward.

Step 3. PPO (RL)

To further improve performance, we initialized a policy and fine‑tuned it with Proximal Policy Optimization (PPO), directly optimizing the expected combined reward while applying an adaptive KL penalty to keep the policy close to the supervised model.

Evaluation

Experiments on objective model‑scoring metrics and human subjective assessments demonstrate that BeautifulPrompt significantly improves prompt quality and the visual appeal of generated images. Ablation studies confirm the effectiveness of each module.

Open‑Source Release

The BeautifulPrompt code will be contributed to the EasyNLP framework, inviting NLP researchers and practitioners to use and extend the method.

EasyNLP repository: https://github.com/alibaba/EasyNLP

References

Chengyu Wang et al., "EasyNLP: A Comprehensive and Easy‑to‑use Toolkit for Natural Language Processing," EMNLP 2022.

Stiennon et al., "Learning to summarize with human feedback," NeurIPS 2020.

Robin Rombach et al., "High‑resolution image synthesis with latent diffusion models," CVPR 2022.

Yuval Kirstain et al., "Pick‑a‑pic: An open dataset of user preferences for text‑to‑image generation," arXiv 2023.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

reinforcement learning AI generation text-to-image synthesis

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.