Baobao Algorithm Notes
Nov 3, 2025 · Artificial Intelligence
Inside Kimi Linear: How Aggressive MoE Sparsity and Hybrid Linear Attention Boost a 3B‑Scale LLM
The author details Kimi Linear's architecture, training challenges, aggressive MoE sparsity, hybrid linear attention design, benchmark gains, and post‑training insights, offering a transparent technical review of this 48B‑parameter MoE LLM built on 5.7 T tokens.
Hybrid ModelKimi LinearLLM
0 likes · 9 min read
