Machine Learning Algorithms & Natural Language Processing
Jun 18, 2026 · Artificial Intelligence
From Imitation to Optimization: Recent Advances in On-Policy Distillation
This article surveys the latest research on On-Policy Distillation for large language models, covering methods that improve training stability, self‑distillation frameworks, and detailed analyses of when and why OPD succeeds or fails, with concrete experimental results and practical insights.
Entropy-AwareLarge Language ModelsOn‑Policy Distillation
0 likes · 19 min read
