Jul 5, 2026 · Artificial Intelligence

Uncovering the Privilege Illusion in OPD Distillation and How DOPD Solves It

The article identifies the hidden “privilege illusion” that degrades on‑policy distillation when privileged information is injected, and introduces Dual On‑policy Distillation (DOPD), a dynamic two‑stream approach that separates true ability gaps from information gaps, achieving superior performance and stability across LLM and VLM benchmarks.

DOPDLarge Language ModelsOPD

0 likes · 13 min read

Uncovering the Privilege Illusion in OPD Distillation and How DOPD Solves It