WaDi: One‑Step Image Generation with LoRA Meets RoPE

This work analyzes weight‑direction changes in diffusion‑model distillation, proposes a low‑rank rotation adapter (LoRaD) to model those changes, and integrates it into Variational Score Distillation as WaDi, achieving state‑of‑the‑art FID on COCO with only ~10% trainable parameters while generalizing to multiple downstream tasks.

Machine Heart
Machine Heart
Machine Heart
WaDi: One‑Step Image Generation with LoRA Meets RoPE

Diffusion models such as Stable Diffusion produce high‑quality images but require many sampling steps, leading to slow inference. Recent distillation methods compress the sampling process to a few or a single step, yet the underlying mechanisms remain unclear.

Motivation and Weight Analysis

The authors examined the weight changes between multi‑step teacher models (e.g., SD 1.5, PixArt‑α) and their corresponding one‑step student models (e.g., DMD2, PixArt‑α DMD). In U‑Net architectures, weight norms across layers remain almost constant (mean ≈ 0.1 %, STD ≈ 0.2 %). In contrast, weight directions change dramatically (mean ≈ 2.2 %, STD ≈ 2.1 %), about 22× larger than norm changes. Similar patterns appear in DiT architectures. These observations suggest that weight direction carries richer, more sensitive information for distillation.

To test the impact of each component, the authors performed controlled ablations by swapping either the norm or the direction of the student’s weights with those of the teacher. Replacing the norm caused negligible performance loss (e.g., DMD2: FID +0.7, CLIP unchanged), whereas replacing the direction caused severe degradation (e.g., DMD2: FID +241.3, CLIP ‑0.18). Thus, direction reconstruction is the primary driver of performance improvement.

Low‑Rank Rotation of Weight Direction (LoRaD)

Inspired by the low‑rank nature of the direction differences (SVD on the residual matrix retains 30 % rank while preserving 93 % information), the authors designed LoRaD, a parameter‑efficient adapter that applies a learnable low‑rank rotation matrix to the pretrained weight direction. The rotation matrix is factorized into two low‑rank matrices, reducing the number of trainable parameters to roughly 10 % of the full model. The implementation leverages block‑diagonal structures and sparse rotations on even/odd index pairs, enabling efficient computation via element‑wise multiplication.

Weight‑Direction‑Aware Distillation (WaDi)

LoRaD is integrated into Variational Score Distillation (VSD) to form WaDi. The teacher is a pretrained diffusion model, while a trainable “fake” model initialized from the teacher approximates the teacher distribution. Both the student (one‑step generator) and the fake model receive LoRaD adapters, allowing them to adjust weight directions without altering norms. Training alternates between updating the student and the fake model, jointly improving image quality.

The training setup uses 1.4 M text prompts from JourneyDB, a learning rate of 1e‑4 for the student and 1e‑2 for the fake model, AdamW optimizer, batch size 128 (16 per GPU), CFG = 1.5, and two epochs. LoRaD rank is set to 256 for SD 1.5/2.1 and 128 for PixArt‑α; the fake model uses rank 32.

Experiments

Evaluation. On COCO 2014 and COCO 2017, WaDi is evaluated with 30 k and 5 k generated images respectively. Metrics include FID (using Inception‑V3), CLIP score (ViT‑G/14), precision, recall, and Human Preference Score v2 (HPSv2).

Results. WaDi achieves the best FID and recall across all backbones, with CLIP and precision ranking first or second. Trainable parameters occupy only 9.74 %–13.30 % of the full U‑Net/DiT parameters, demonstrating high parameter efficiency. Qualitative comparisons show WaDi consistently preserves structure, style, and semantic alignment, outperforming baselines such as DMD2, SiD‑LSG, and SwiftBrush.

Downstream Tasks. WaDi is applied to ControlNet, Reversion (relationship inversion), and DreamBooth. In ControlNet, inference time drops by 86.26 % with comparable image quality. In Reversion, inference time drops by 88.89 % while maintaining high fidelity to relational prompts. In DreamBooth, LoRaD improves subject fidelity and prompt adherence compared to FT and LoRA baselines.

User Study. Fifty‑seven participants evaluated zero‑shot generation and downstream tasks. Results show WaDi outperforms existing methods in both image quality and text‑image alignment.

Ablation Studies. Five adapter types were compared on COCO 2017 under VSD loss. LoRaD, with only 83.8 M trainable parameters (≈ 31 % fewer than LoRA/DoRA and ≈ 90 % fewer than full‑fine‑tuning), achieves the lowest FID (20.86) and competitive CLIP score (0.31). LoRaD also yields the highest direction mean (2.89 %). Rank‑configuration ablations on COCO 2014 reveal that increasing student rank improves performance up to a point (FID 13.64 → 10.79), after which gains diminish or reverse.

Conclusion

The paper introduces WaDi, a weight‑direction‑aware distillation framework that leverages LoRaD to efficiently model directional changes during diffusion‑model distillation. Extensive experiments demonstrate that WaDi substantially improves image quality and inference speed over existing single‑step methods while requiring only a fraction of trainable parameters, and it generalizes well to various downstream generation tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LoRARoPEdiffusion modelsmodel distillationsingle‑step image generationweight direction
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.