Tagged articles

training trade-offs

1 articles · Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Jun 16, 2026 · Artificial Intelligence

SFT, DAgger, Offline RL, and OPD: Four Methods Mapped onto a Single 2×2 Grid

The paper shows that SFT, DAgger, offline RL and OPD are the four orthogonal combinations of prefix source (teacher vs. student) and KL direction (forward vs. reverse), exposing three hidden trade‑offs—KL direction, prefix source, and training length—and proposes KL‑mixing and entropy‑gated length curricula that boost Avg@k by 3.6 points, raise Pass@k by up to 5.8 points, and cut response length by three‑fold.

DAggerKL divergenceLLM distillation
0 likes · 17 min read
SFT, DAgger, Offline RL, and OPD: Four Methods Mapped onto a Single 2×2 Grid