Machine Heart
May 13, 2026 · Artificial Intelligence
Why Bigger Teachers Don’t Teach Better: Tsinghua’s On‑Policy Distillation Study
Recent research by Tsinghua and collaborators dissects On‑Policy Distillation for large language models, revealing that higher‑scoring teachers often fail to improve students unless their thinking patterns align, detailing token‑level overlap dynamics, failure cases, and two practical remedies to rescue ineffective distillation.
Model ScalingOn-Policy DistillationRL Post-Training
0 likes · 9 min read
