May 25, 2023 · Artificial Intelligence

QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting

QQGC, Tencent’s two‑stage text‑to‑image model that separates CLIP‑based Prior mapping from a Stable Diffusion Decoder, leverages T5‑enhanced text embeddings and a suite of efficiency tricks—including FP16, flash attention, ZeRO and GPU‑RDMA—to train over‑2 B‑parameter models on 64 GPUs, achieving state‑of‑the‑art FID and CLIP scores while supporting image variation, semantic img2img, precise CLIP‑vector edits and unsafe‑content filtering, and now powers the company’s Magic Painting Room.

AI paintingCLIP embeddingTraining Acceleration

0 likes · 12 min read

QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting