Baobao Algorithm Notes
Jul 17, 2025 · Artificial Intelligence
How QK-Clip Tames MaxLogit Explosions in Trillion‑Parameter LLMs
The article introduces QK-Clip, a lightweight per‑head weight‑clipping technique that uses the MaxLogit signal to prevent uncontrolled logit growth in massive LLMs, explains its design, compares it with prior methods, and shows that it stabilizes training without harming model performance.
Attention stabilityLLM trainingMaxLogit
0 likes · 15 min read
