AI Frontier Lectures
Apr 24, 2025 · Artificial Intelligence
How d1 Boosts Reasoning in Diffusion LLMs with Reinforcement Learning
Researchers from UCLA and Meta AI introduce d1, a two‑stage post‑training framework that combines supervised fine‑tuning and a novel diffu‑GRPO reinforcement‑learning algorithm to enable efficient reasoning in masked diffusion large language models, achieving state‑of‑the‑art performance on multiple math and logic benchmarks.
AId1diffu-GRPO
0 likes · 9 min read
