Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 28, 2025 · Artificial Intelligence

Why Entropy Collapse Limits LLM Reinforcement Learning and How to Fix It

The article explains how information entropy, cross‑entropy, and KL‑divergence shape reinforcement learning for large language models, describes the phenomenon of entropy collapse, compares token‑level and policy‑level entropy, and reviews recent methods like Clip‑Cov and KL‑Cov that mitigate this issue.

cross-entropyentropypolicy entropy
0 likes · 11 min read
Why Entropy Collapse Limits LLM Reinforcement Learning and How to Fix It