Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models
Target‑Driven Distillation (TDD) is a multi‑goal distillation method that flexibly selects short‑range target steps and decouples guidance during training, enabling 4‑to‑8‑step diffusion generation that preserves high‑resolution detail, works with LoRA, ControlNet, InstantID, and outperforms existing consistency distillation techniques in speed and quality.
TDD (Target‑Driven Distillation) is presented as an innovative acceleration technique for high‑resolution, challenging image generation. It demonstrates strong performance, flexibility, and compatibility, seamlessly adapting to various base models, integrating with multiple LoRA techniques, and supporting advanced control strategies such as ControlNet and InstantID.
Background: Diffusion models are popular for image generation but require many iterative steps, leading to long generation times. Existing consistency distillation methods accelerate generation but often sacrifice image detail.
The authors propose a novel multi‑goal distillation approach—Target‑Driven Distillation (TDD)—which selects generation steps flexibly and decouples guidance signals during training, greatly improving both speed and quality. Experimental results show TDD’s superiority across multiple tasks.
Consistency distillation methods are divided into single‑goal and multi‑goal approaches. Single‑goal methods map each time step to a fixed target step, which can accumulate errors over long prediction distances. Multi‑goal methods allow a one‑to‑many mapping, selecting different target steps for the same source step, offering better performance but often requiring higher training time budgets.
Key components of TDD:
(1) Fine‑grained target‑step selection: For any source step, TDD chooses the next adjacent step that falls within a predefined short‑step, equally spaced denoising schedule (e.g., 4‑to‑8 steps, Kmin=4, Kmax=8). This eliminates long‑distance predictions and focuses only on steps likely to be visited during inference.
(2) Decoupled guidance during training: When distilling from a classifier‑free guidance (CFG) model to the distilled model, TDD replaces part of the text condition with an unconditional (blank) prompt. This enables a proposed Guidance Scale Tuning inference technique, allowing users to balance accuracy and richness of text‑conditioned image generation.
(3) Optional non‑uniform sampling: Short‑distance predictions are used in early steps and long‑distance predictions in later steps, improving overall image quality. TDD also employs x₀ clipping to prevent out‑of‑boundary predictions and mitigates over‑exposure.
Extensive visual comparisons demonstrate that TDD (4‑to‑8 steps) outperforms existing open‑source solutions in both image quality and speed, including comparisons on different LoRA and ControlNet configurations, as well as video model distillation where TDD surpasses AnimateLCM on SVD‑xt 1.1 with 4‑8 steps.
Authors: Wang Cunzhen (Xiaohongshu AIGC intern, Zhejiang University), Guo Ziyuan (Xiaohongshu AIGC R&D engineer), Duan Yuxuan (Xiaohongshu AIGC intern, Shanghai Jiao Tong University). The team focuses on large‑scale computer vision and multimodal models, with numerous publications in top conferences such as CVPR, TPAMI, IJCV, ECCV, NeurIPS, and ICCV.
Xiaohongshu Tech REDtech
Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.