Accelerating Stable Diffusion Models: Evaluation of FlashAttention2, OneFlow, DeepCache, Stable-Fast, and LCM-LoRA
Our benchmark of FlashAttention2, OneFlow, DeepCache, Stable‑Fast, and LCM‑LoRA on Stable Diffusion models shows that DeepCache combined with PyTorch 2.2 consistently cuts inference time by 40‑50% with minimal code changes, while OneFlow offers 20‑40% speedups when compatible, making DeepCache the recommended default acceleration.
We investigated several popular acceleration techniques for Stable Diffusion (SD) models on our MVAP platform, focusing on inference speed for AI‑try‑on‑clothing scenarios.
Methods evaluated: FlashAttention2 (operator optimization), OneFlow and stable‑fast (model compilation), DeepCache (model caching), and LCM‑LoRA (model distillation).
Test environment: A10 GPU, CUDA 11.8, Python 3.10, PyTorch 2.0.1, Diffusers 0.26.3, prompts for text‑to‑image generation (512×512, 50 steps).
Results for SD‑1.5:
OneFlow compilation reduced runtime by ~41% with negligible quality loss, but requires a warm‑up compilation step.
DeepCache added another 15‑25% speedup; larger cache intervals increase speed but degrade image quality.
Stable‑fast gave modest acceleration and suffered from dependency issues.
Detailed benchmark tables show average generation times and visual examples for each method.
Results for SD‑XL (with LoRA adapters):
OneFlow lowered runtime by ~24% while preserving quality.
DeepCache achieved up to 69% speedup at high cache intervals, with noticeable quality trade‑offs.
LCM‑LoRA dramatically reduced required steps but was unstable with pretrained weights.
Overall, DeepCache combined with the latest PyTorch (2.2) provides a reliable 40‑50% runtime reduction without extensive code changes. OneFlow is promising when the pipeline does not heavily modify UNet sub‑modules, but its current support for SD models is limited.
We recommend using DeepCache as the default acceleration, and OneFlow where compatible, while keeping PyTorch up‑to‑date.
DaTaobao Tech
Official account of DaTaobao Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.