From Imitation to Optimization: Recent Advances in On-Policy Distillation

This article surveys the latest research on On-Policy Distillation for large language models, covering methods that improve training stability, self‑distillation frameworks, and detailed analyses of when and why OPD succeeds or fails, with concrete experimental results and practical insights.

Entropy-AwareLarge Language ModelsOn‑Policy Distillation

0 likes · 19 min read

From Imitation to Optimization: Recent Advances in On-Policy Distillation