How Single Trajectory Distillation Boosts Diffusion Model Speed and Style Quality

The paper introduces Single Trajectory Distillation (STD), a novel training framework that aligns full PF‑ODE trajectories from a fixed noisy state, uses a Trajectory Bank to cut training cost, and adds an Asymmetric Adversarial Loss to markedly improve style consistency and aesthetic quality while accelerating image and video style‑transfer diffusion models.

Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
How Single Trajectory Distillation Boosts Diffusion Model Speed and Style Quality

Introduction

Consistency‑based diffusion model acceleration suffers from degraded style similarity and aesthetic quality in image‑to‑image or video‑to‑video style‑transfer tasks because existing methods only align the initial steps of the student’s PF‑ODE trajectory with an imperfect teacher.

Proposed Method: Single Trajectory Distillation (STD)

STD trains the student model on a single, complete trajectory starting from a fixed partially‑noised state, ensuring full‑trajectory consistency. To offset the extra training time, a Trajectory Bank stores intermediate teacher PF‑ODE states, allowing direct sampling of trajectory points during student training.

An Asymmetric Adversarial Loss based on DINO‑v2 features is introduced to enhance style fidelity and perceptual quality.

Theoretical Foundations

The paper defines diffusion trajectories, derives error bounds showing that closer time steps yield smaller errors, and formulates the STD loss, including a term that forces student time steps to stay close to teacher steps.

Trajectory Bank

The bank caches teacher trajectory states; during training, random samples are drawn, processed by the teacher, and stored back, eliminating repeated ODE solves.

Asymmetric Adversarial Loss

Unlike traditional pixel‑level adversarial losses, this loss matches DINO‑v2 feature distributions between generated and real images, providing semantic constraints and better video‑distillation efficiency.

Experiments

STD is evaluated on image and video style‑transfer benchmarks (Open‑Sora‑Plan‑v1.0.0 training set, wikiArt+COCO test set). Metrics include style similarity (CSD), LAION aesthetic score, and warping error. STD outperforms LCM, TCD, PCM, TDD, Hyper‑SD, SDXL‑Lightning, and MCM across 8‑step, 6‑step, and 4‑step settings, achieving state‑of‑the‑art style consistency and aesthetics.

Ablation studies confirm the effectiveness of the Trajectory Bank (reducing 3.8× training overhead) and the Asymmetric Adversarial Loss (improving style similarity and aesthetic scores).

Scalability and Extensions

STD’s framework generalizes to other partial‑noise editing tasks such as inpainting, demonstrating broader applicability.

Conclusion

By aligning full trajectories, leveraging a trajectory cache, and employing a semantic adversarial loss, STD significantly improves both speed and visual quality of diffusion‑based style transfer, offering a versatile tool for future image/video editing research.

Paper illustration
Paper illustration
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Diffusion Modelsconsistency modelsAI accelerationStyle Transfertrajectory distillation
Xiaohongshu Tech REDtech
Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.