Artificial Intelligence 8 min read

Kuaishou Open-Sources Kolors: A High-Performance Text-to-Image Model Rivaling Midjourney v6

Kuaishou has officially open-sourced Kolors, a state-of-the-art text-to-image diffusion model that leverages ChatGLM3 for advanced bilingual text understanding and employs a two-stage training strategy to achieve photographic image quality rivaling leading proprietary systems.

Kuaishou Tech

Jul 11, 2024

Kuaishou Open-Sources Kolors: A High-Performance Text-to-Image Model Rivaling Midjourney v6

Kuaishou has officially open-sourced Kolors, a high-performance text-to-image diffusion model that matches the quality of Midjourney v6. The model supports bilingual prompts up to 256 characters and features native Chinese and English text generation capabilities. It is now available on GitHub and Hugging Face with full weights and code for developers.

Technically, Kolors replaces traditional CLIP encoders with the ChatGLM3 large language model, significantly enhancing complex semantic understanding and multi-object rendering. The training pipeline utilizes a two-stage progressive strategy: initial concept learning on billions of image-text pairs, followed by quality fine-tuning on curated high-aesthetic datasets. A novel noise scheduling strategy further stabilizes high-resolution generation.

In comprehensive evaluations, Kolors ranks second globally in subjective scoring, trailing only DALL-E 3, while leading in subjective image quality. Using the custom KolorsPrompts benchmark and the MPS automated metric, the model demonstrates superior performance in overall satisfaction, image fidelity, and prompt alignment compared to both open and closed-source alternatives.

The model has been successfully integrated into Kuaishou's ecosystem, powering applications like AI avatars, IP customization via Dreambooth/LoRA, and virtual try-on. Kuaishou plans to release additional tools like ControlNet, aiming to enrich the open-source generative AI ecosystem and accelerate downstream innovation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models Open-source AI Text-to-Image Generation Generative AI

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.