Machine Learning Algorithms & Natural Language Processing
Jun 16, 2026 · Artificial Intelligence
SFT, DAgger, Offline RL, and OPD: Four Methods Mapped onto a Single 2×2 Grid
The paper shows that SFT, DAgger, offline RL and OPD are the four orthogonal combinations of prefix source (teacher vs. student) and KL direction (forward vs. reverse), exposing three hidden trade‑offs—KL direction, prefix source, and training length—and proposes KL‑mixing and entropy‑gated length curricula that boost Avg@k by 3.6 points, raise Pass@k by up to 5.8 points, and cut response length by three‑fold.
DAggerKL divergenceLLM distillation
0 likes · 17 min read
