Tagged articles

Block Diffusion

1 articles · Page 1 of 1

May 10, 2026 · Artificial Intelligence

DFlash Boosts Large Model Inference Up to 6× – Now Supporting DeepSeek-V4

DFlash replaces the speculative draft model with a block‑diffusion drafter, generating 16 tokens per forward pass and achieving up to 6× speedup over baseline (2.5× over EAGLE‑3) without quality loss, while supporting a wide range of open‑source LLMs and multiple back‑ends.

Block DiffusionDFlashLLM Inference

0 likes · 12 min read

DFlash Boosts Large Model Inference Up to 6× – Now Supporting DeepSeek-V4