How Single Trajectory Distillation Boosts Diffusion Model Speed and Style Quality
The paper introduces Single Trajectory Distillation (STD), a novel training framework that aligns full PF‑ODE trajectories from a fixed noisy state, uses a Trajectory Bank to cut training cost, and adds an Asymmetric Adversarial Loss to markedly improve style consistency and aesthetic quality while accelerating image and video style‑transfer diffusion models.
Introduction
Consistency‑based diffusion model acceleration suffers from degraded style similarity and aesthetic quality in image‑to‑image or video‑to‑video style‑transfer tasks because existing methods only align the initial steps of the student’s PF‑ODE trajectory with an imperfect teacher.
Proposed Method: Single Trajectory Distillation (STD)
STD trains the student model on a single, complete trajectory starting from a fixed partially‑noised state, ensuring full‑trajectory consistency. To offset the extra training time, a Trajectory Bank stores intermediate teacher PF‑ODE states, allowing direct sampling of trajectory points during student training.
An Asymmetric Adversarial Loss based on DINO‑v2 features is introduced to enhance style fidelity and perceptual quality.
Theoretical Foundations
The paper defines diffusion trajectories, derives error bounds showing that closer time steps yield smaller errors, and formulates the STD loss, including a term that forces student time steps to stay close to teacher steps.
Trajectory Bank
The bank caches teacher trajectory states; during training, random samples are drawn, processed by the teacher, and stored back, eliminating repeated ODE solves.
Asymmetric Adversarial Loss
Unlike traditional pixel‑level adversarial losses, this loss matches DINO‑v2 feature distributions between generated and real images, providing semantic constraints and better video‑distillation efficiency.
Experiments
STD is evaluated on image and video style‑transfer benchmarks (Open‑Sora‑Plan‑v1.0.0 training set, wikiArt+COCO test set). Metrics include style similarity (CSD), LAION aesthetic score, and warping error. STD outperforms LCM, TCD, PCM, TDD, Hyper‑SD, SDXL‑Lightning, and MCM across 8‑step, 6‑step, and 4‑step settings, achieving state‑of‑the‑art style consistency and aesthetics.
Ablation studies confirm the effectiveness of the Trajectory Bank (reducing 3.8× training overhead) and the Asymmetric Adversarial Loss (improving style similarity and aesthetic scores).
Scalability and Extensions
STD’s framework generalizes to other partial‑noise editing tasks such as inpainting, demonstrating broader applicability.
Conclusion
By aligning full trajectories, leveraging a trajectory cache, and employing a semantic adversarial loss, STD significantly improves both speed and visual quality of diffusion‑based style transfer, offering a versatile tool for future image/video editing research.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Xiaohongshu Tech REDtech
Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
