Qwen3.5-27B-DFlash Delivers Up to 5× Faster Inference Without Quality Loss
The DFlash approach replaces speculative decoding’s autoregressive drafter with a block diffusion model and injects target‑model hidden features into every KV‑cache layer, achieving up to 5× speed‑up for Qwen3.5‑27B on single‑GPU and 1.5–1.9× on high‑concurrency workloads while preserving output quality.
