Old Zhang's AI Learning
May 10, 2026 · Artificial Intelligence
DFlash Boosts Large Model Inference Up to 6× – Now Supporting DeepSeek-V4
DFlash replaces the speculative draft model with a block‑diffusion drafter, generating 16 tokens per forward pass and achieving up to 6× speedup over baseline (2.5× over EAGLE‑3) without quality loss, while supporting a wide range of open‑source LLMs and multiple back‑ends.
Block DiffusionDFlashLLM inference
0 likes · 12 min read
