Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Nov 4, 2025 · Artificial Intelligence

Unveiling the Law of Capacity Gap: Boosting Language Model Distillation Efficiency

At ACL 2025, a collaborative paper introduced the Law of Capacity Gap, revealing a linear 2.5× optimal teacher‑student size relationship in language model distillation, dramatically cutting compute costs and achieving Pareto‑optimal efficiency, with the MiniMA model as a successful demonstration.

DistillationMiniMAartificial-intelligence
0 likes · 7 min read
Unveiling the Law of Capacity Gap: Boosting Language Model Distillation Efficiency