Machine Learning Algorithms & Natural Language Processing
Feb 22, 2026 · Artificial Intelligence
What Is On-Policy Distillation? A Deep Dive into On-Policy and Self-Distillation
The article explains On-Policy Distillation, derives its forward and reverse KL gradients, introduces Self‑Distillation where the policy serves as its own teacher, discusses practical implementation tricks such as extra‑knowledge injection, EMA or trust‑region teacher stabilization, and highlights benefits like reduced catastrophic forgetting, fewer Aha moments, and a narrower train‑test gap, especially for larger models.
Catastrophic ForgettingEMAKL Divergence
0 likes · 6 min read
