Artificial Intelligence 8 min read

Kuaishou Open-Sources Kolors: A High-Performance Text-to-Image Model Rivaling Midjourney v6

Kuaishou has officially open-sourced Kolors, a state-of-the-art text-to-image diffusion model that leverages ChatGLM3 for advanced bilingual text understanding and employs a two-stage training strategy to achieve photographic image quality rivaling leading proprietary systems.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
Kuaishou Open-Sources Kolors: A High-Performance Text-to-Image Model Rivaling Midjourney v6

Kuaishou has officially open-sourced Kolors, a high-performance text-to-image diffusion model that matches the quality of Midjourney v6. The model supports bilingual prompts up to 256 characters and features native Chinese and English text generation capabilities. It is now available on GitHub and Hugging Face with full weights and code for developers.

Technically, Kolors replaces traditional CLIP encoders with the ChatGLM3 large language model, significantly enhancing complex semantic understanding and multi-object rendering. The training pipeline utilizes a two-stage progressive strategy: initial concept learning on billions of image-text pairs, followed by quality fine-tuning on curated high-aesthetic datasets. A novel noise scheduling strategy further stabilizes high-resolution generation.

In comprehensive evaluations, Kolors ranks second globally in subjective scoring, trailing only DALL-E 3, while leading in subjective image quality. Using the custom KolorsPrompts benchmark and the MPS automated metric, the model demonstrates superior performance in overall satisfaction, image fidelity, and prompt alignment compared to both open and closed-source alternatives.

The model has been successfully integrated into Kuaishou's ecosystem, powering applications like AI avatars, IP customization via Dreambooth/LoRA, and virtual try-on. Kuaishou plans to release additional tools like ControlNet, aiming to enrich the open-source generative AI ecosystem and accelerate downstream innovation.

computer visionLarge Language Modelsmodel evaluationopen-source AIdiffusion modelsText-to-Image Generationgenerative AI
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.