Baobao Algorithm Notes
Oct 28, 2025 · Artificial Intelligence
Why Entropy Collapse Limits LLM Reinforcement Learning and How to Fix It
The article explains how information entropy, cross‑entropy, and KL‑divergence shape reinforcement learning for large language models, describes the phenomenon of entropy collapse, compares token‑level and policy‑level entropy, and reviews recent methods like Clip‑Cov and KL‑Cov that mitigate this issue.
cross-entropyentropypolicy entropy
0 likes · 11 min read
