Why WeDLM Outpaces AR Models: Diffusion Decoding Meets KV Cache for 10× Faster Inference
Tencent WeChat AI introduces WeDLM, a diffusion language model that works with standard causal attention and KV caching, achieving up to ten‑fold speedups over autoregressive models while maintaining or improving generation quality across math reasoning and open‑ended tasks.
