Training Scaling — 1 Technical Articles

Nov 3, 2025 · Artificial Intelligence

Inside Kimi Linear: How Aggressive MoE Sparsity and Hybrid Linear Attention Boost a 3B‑Scale LLM

The author details Kimi Linear's architecture, training challenges, aggressive MoE sparsity, hybrid linear attention design, benchmark gains, and post‑training insights, offering a transparent technical review of this 48B‑parameter MoE LLM built on 5.7 T tokens.

Hybrid ModelKimi LinearLLM

0 likes · 9 min read

Inside Kimi Linear: How Aggressive MoE Sparsity and Hybrid Linear Attention Boost a 3B‑Scale LLM