Kuaishou Open-Sources Kolors: A High-Performance Text-to-Image Model Rivaling Midjourney v6
Kuaishou has officially open-sourced Kolors, a state-of-the-art text-to-image diffusion model that leverages ChatGLM3 for advanced bilingual text understanding and employs a two-stage training strategy to achieve photographic image quality rivaling leading proprietary systems.
Kuaishou has officially open-sourced Kolors, a high-performance text-to-image diffusion model that matches the quality of Midjourney v6. The model supports bilingual prompts up to 256 characters and features native Chinese and English text generation capabilities. It is now available on GitHub and Hugging Face with full weights and code for developers.
Technically, Kolors replaces traditional CLIP encoders with the ChatGLM3 large language model, significantly enhancing complex semantic understanding and multi-object rendering. The training pipeline utilizes a two-stage progressive strategy: initial concept learning on billions of image-text pairs, followed by quality fine-tuning on curated high-aesthetic datasets. A novel noise scheduling strategy further stabilizes high-resolution generation.
In comprehensive evaluations, Kolors ranks second globally in subjective scoring, trailing only DALL-E 3, while leading in subjective image quality. Using the custom KolorsPrompts benchmark and the MPS automated metric, the model demonstrates superior performance in overall satisfaction, image fidelity, and prompt alignment compared to both open and closed-source alternatives.
The model has been successfully integrated into Kuaishou's ecosystem, powering applications like AI avatars, IP customization via Dreambooth/LoRA, and virtual try-on. Kuaishou plans to release additional tools like ControlNet, aiming to enrich the open-source generative AI ecosystem and accelerate downstream innovation.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.