Diffusion Language Model — 5 Technical Articles

Feb 10, 2026 · Artificial Intelligence

WeDLM Diffusion Language Model Tutorial: 3× Faster Inference Than vLLM AR Models

The Tencent WeChat AI team introduces WeDLM, a diffusion language model that, through topological reordering, surpasses autoregressive models on the industrial‑grade vLLM engine with over threefold speedup on math reasoning and up to tenfold in low‑entropy scenarios, and provides a step‑by‑step online tutorial with GPU compute credits.

Diffusion Language ModelGPU ComputeInference Acceleration

0 likes · 5 min read

WeDLM Diffusion Language Model Tutorial: 3× Faster Inference Than vLLM AR Models

AI Frontier Lectures

Jan 5, 2026 · Artificial Intelligence

Why WeDLM Outpaces AR Models: Diffusion Decoding Meets KV Cache for 10× Faster Inference

Tencent WeChat AI introduces WeDLM, a diffusion language model that works with standard causal attention and KV caching, achieving up to ten‑fold speedups over autoregressive models while maintaining or improving generation quality across math reasoning and open‑ended tasks.

Diffusion Language ModelKV cacheParallel Decoding

0 likes · 8 min read

Why WeDLM Outpaces AR Models: Diffusion Decoding Meets KV Cache for 10× Faster Inference

Data Party THU

Oct 31, 2025 · Artificial Intelligence

How SPG’s Sandwich Gradient Boosts Diffusion Language Models Across Four Benchmarks

The SPG algorithm introduces a sandwiched policy gradient that uses computable lower and upper evidence bounds to align reinforcement learning for discrete diffusion language models, achieving faster convergence, higher peaks, and lower variance on four major reasoning benchmarks.

Diffusion Language ModelEUBOPolicy Gradient

0 likes · 9 min read

How SPG’s Sandwich Gradient Boosts Diffusion Language Models Across Four Benchmarks

AntTech

Oct 13, 2025 · Artificial Intelligence

How dInfer Accelerates Diffusion LLM Inference Over 10× Faster Than Fast‑dLLM

Ant Group's open‑source dInfer framework dramatically speeds up diffusion language model inference—achieving more than a ten‑fold boost over Fast‑dLLM, surpassing autoregressive baselines, and delivering 1011 tokens per second on HumanEval—by tackling computational cost, KV‑cache invalidation, and parallel decoding challenges through modular system‑level innovations.

AI performanceDiffusion Language ModelLLM

0 likes · 11 min read

How dInfer Accelerates Diffusion LLM Inference Over 10× Faster Than Fast‑dLLM

AntTech

Sep 13, 2025 · Artificial Intelligence

LLaDA‑MoE: The First Native MoE Diffusion Language Model Shattering Autoregressive Limits

Ant Group and Renmin University unveiled LLaDA‑MoE, the industry’s first native MoE‑based diffusion language model trained on 20 TB of data, achieving performance comparable to Qwen2.5 while delivering several‑fold faster inference, and the model will be fully open‑sourced to accelerate global AI research.

AI researchDiffusion Language ModelLLaDA-MoE

0 likes · 6 min read

LLaDA‑MoE: The First Native MoE Diffusion Language Model Shattering Autoregressive Limits