Artificial Intelligence 9 min read

How DreamLite Enables Real-Time Text-to-Image Generation and Editing on Mobile Devices

DreamLite, a 0.39 B‑parameter diffusion model from ByteDance, unifies text‑to‑image generation and text‑guided editing in a single on‑device network, delivering 1024×1024 results in about three seconds on an iPhone 17 Pro while surpassing existing mobile and even many server‑side baselines.

Machine Heart

May 12, 2026

How DreamLite Enables Real-Time Text-to-Image Generation and Editing on Mobile Devices

DreamLite Overview

DreamLite is a 0.39 B‑parameter unified diffusion model that supports both text‑to‑image (T2I) generation and text‑guided image editing within a single network, enabling fully on‑device operation.

On an iPhone 17 Pro the model generates or edits a 1024×1024 image in approximately three seconds without any network connection.

Benchmark Performance

GenEval: 0.72

DPG: 85.8

ImgEdit: 4.11

GEdit: 6.88

These results surpass existing mobile‑side models and are comparable to server‑side models that are 10–30 × larger, such as FLUX and OmniGen2.

Challenges for On‑Device Diffusion

Separate generation and editing pipelines require two large models, exceeding mobile memory and storage limits.

Compressing large models typically degrades image quality or increases latency, breaking real‑time interaction.

Users need seamless switching between generation and editing without additional downloads or maintenance.

Core Design of DreamLite

In‑Context Spatial Concatenation The pruned SDXL U‑Net backbone receives a pair of side‑by‑side latent tensors. For generation the right tensor is a black placeholder (no visual condition); for editing it contains the image to be modified. An explicit task token ( [Generate] or [Edit] ) is prepended to the text prompt, allowing the same U‑Net to route the task without extra branches.

Task‑Progressive Joint Pretraining Training proceeds in three stages to stabilize the small model:

Stage 1 – T2I pretraining on large‑scale image‑text data.

Stage 2 – Editing pretraining using the in‑context condition to preserve original structure while following edit instructions.

Stage 3 – Unified joint pretraining that continues optimizing both tasks under the same in‑context format.

This phased approach enables the 0.39 B model to learn both capabilities reliably.

RLHF Alignment + DMD2 Step Distillation After pretraining the model undergoes two refinement steps:

High‑quality supervised fine‑tuning followed by reinforcement learning from human feedback (RLHF). Generation uses HPSv3 as the reward model; editing uses EditReward. ReFL is applied to optimize the diffusion process, markedly improving aesthetics and instruction following.

DMD2 (Distribution Matching Distillation 2) compresses the sampling process from dozens of steps to four, enabling real‑time inference.

Deployment

Quantization and deployment allow the full workflow to run locally on the device, preserving user privacy by eliminating any data transmission.

Experimental Results

On GenEval DreamLite scores 0.72; on DPG it achieves 85.8.

For image editing it reaches 4.11 on ImgEdit and 6.88 on GEdit, outperforming lightweight single‑task baselines such as SnapGen and SANA while remaining competitive with much larger server models.

Mobile Demonstration Scenarios

Portrait generation followed by style transfer to an oil‑painting look.

Landscape generation with subsequent seasonal background replacement.

Product scene creation with flexible object addition, removal, or replacement.

All processing stays on the device, ensuring zero data transmission.

Significance

A single model replaces two, cutting memory, storage, and deployment costs on edge devices.

Four‑step DMD2 distillation yields second‑level latency suitable for app‑level experiences.

Fully on‑device operation eliminates cloud inference costs and privacy risks.

The modest 0.39 B size opens the possibility of diffusion‑based creation tools on mid‑range and low‑end smartphones.

Resources

Paper: https://arxiv.org/abs/2603.28713

Project page: https://carlofkl.github.io/dreamlite/

GitHub repository: https://github.com/ByteVisionLab/DreamLite

Online demo: https://huggingface.co/spaces/carlofkl/DreamLite

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mobile AI diffusion model model compression text-to-image RLHF image editing DreamLite

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.