AI Frontier Lectures
AI Frontier Lectures
Apr 24, 2025 · Artificial Intelligence

How d1 Boosts Reasoning in Diffusion LLMs with Reinforcement Learning

Researchers from UCLA and Meta AI introduce d1, a two‑stage post‑training framework that combines supervised fine‑tuning and a novel diffu‑GRPO reinforcement‑learning algorithm to enable efficient reasoning in masked diffusion large language models, achieving state‑of‑the‑art performance on multiple math and logic benchmarks.

AId1diffu-GRPO
0 likes · 9 min read
How d1 Boosts Reasoning in Diffusion LLMs with Reinforcement Learning