Artificial Intelligence 10 min read

Accelerating Stable Diffusion Models: Evaluation of FlashAttention2, OneFlow, DeepCache, Stable-Fast, and LCM-LoRA

Our benchmark of FlashAttention2, OneFlow, DeepCache, Stable‑Fast, and LCM‑LoRA on Stable Diffusion models shows that DeepCache combined with PyTorch 2.2 consistently cuts inference time by 40‑50% with minimal code changes, while OneFlow offers 20‑40% speedups when compatible, making DeepCache the recommended default acceleration.

DaTaobao Tech

Apr 26, 2024

Accelerating Stable Diffusion Models: Evaluation of FlashAttention2, OneFlow, DeepCache, Stable-Fast, and LCM-LoRA

We investigated several popular acceleration techniques for Stable Diffusion (SD) models on our MVAP platform, focusing on inference speed for AI‑try‑on‑clothing scenarios.

Methods evaluated: FlashAttention2 (operator optimization), OneFlow and stable‑fast (model compilation), DeepCache (model caching), and LCM‑LoRA (model distillation).

Test environment: A10 GPU, CUDA 11.8, Python 3.10, PyTorch 2.0.1, Diffusers 0.26.3, prompts for text‑to‑image generation (512×512, 50 steps).

Results for SD‑1.5:

OneFlow compilation reduced runtime by ~41% with negligible quality loss, but requires a warm‑up compilation step.

DeepCache added another 15‑25% speedup; larger cache intervals increase speed but degrade image quality.

Stable‑fast gave modest acceleration and suffered from dependency issues.

Detailed benchmark tables show average generation times and visual examples for each method.

Results for SD‑XL (with LoRA adapters):

OneFlow lowered runtime by ~24% while preserving quality.

DeepCache achieved up to 69% speedup at high cache intervals, with noticeable quality trade‑offs.

LCM‑LoRA dramatically reduced required steps but was unstable with pretrained weights.

Overall, DeepCache combined with the latest PyTorch (2.2) provides a reliable 40‑50% runtime reduction without extensive code changes. OneFlow is promising when the pipeline does not heavily modify UNet sub‑modules, but its current support for SD models is limited.

We recommend using DeepCache as the default acceleration, and OneFlow where compatible, while keeping PyTorch up‑to‑date.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Stable Diffusion DeepCache FlashAttention2 LCM-LoRA model acceleration OneFlow

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.