ShiZhen AI
Mar 17, 2026 · Artificial Intelligence
Kimi’s Attention Residuals Swap a Decade-Old Residual Trick for 1.25× Faster 48B MoE
The Kimi team introduces Attention Residuals, a softmax‑based replacement for the uniform residual connections used in Transformers for a decade, enabling selective aggregation of layer histories, reducing hidden‑state growth, and achieving a 1.25× compute‑efficiency gain on a 48‑billion‑parameter MoE model with less than 2% inference latency increase.
Attention ResidualsCompute EfficiencyMoE
0 likes · 10 min read
