DreamLite: A 0.39B Mobile Model Matching Z‑Image for Real‑Time Text‑to‑Image Generation and Editing

DreamLite is a compact 0.39 B unified diffusion model open‑sourced by ByteDance that runs on smartphones, delivering text‑to‑image generation and text‑guided editing in about three seconds for 1024×1024 pictures, with performance comparable to Flux, Z‑Image and LongCat‑Image and offering two variants to balance fidelity and latency.

SuanNi
SuanNi
SuanNi
DreamLite: A 0.39B Mobile Model Matching Z‑Image for Real‑Time Text‑to‑Image Generation and Editing

DreamLite is a newly open‑sourced compact unified diffusion model (0.39 B parameters) from ByteDance that can generate images from text and edit images via textual prompts directly on a mobile device.

The architecture builds on a pruned mobile U‑Net backbone and introduces context‑spatial connections in the latent space to unify generation and editing. Training follows a task‑progressive joint pre‑training strategy (T2I → editing → joint) and employs Qwen3‑VL as the text‑embedding model; step‑distillation reduces inference to four steps.

On an iPhone 17 Pro, a 4‑bit Qwen‑VL encoder, fp16 Tiny VAE, and the UNet backbone produce or edit a 1024×1024 image in roughly three seconds, making DreamLite the first unified on‑device model that eliminates the need for separate generation and editing networks.

Benchmark results show DreamLite matches the quality of open‑source models such as Flux, Z‑Image and LongCat‑Image. The release provides two model variants that let users choose the optimal balance between visual realism and on‑device inference latency.

Model weights are currently undergoing safety review. Developers interested in early access can request it via email ([email protected]) with name, affiliation, and intended use, and must agree to ethical guidelines that forbid any illegal, violent, or harmful content.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

text-to-imageAI modelimage editingByteDanceQwen3-VLDreamLitemobile diffusion
SuanNi
Written by

SuanNi

A community for AI developers that aggregates large-model development services, models, and compute power.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.