Xiaomi’s New MiMo‑V2‑Flash LLM Rivals DeepSeek‑V3.2 and Near‑GPT‑5 High
Xiaomi’s MiMo‑V2‑Flash, a 309B‑parameter MoE LLM with only 15B active weights, uses Hybrid SWA, Multi‑Token Prediction and Multi‑Teacher On‑Policy Distillation to cut KV‑cache by six times, boost inference speed 2.6×, and achieve performance comparable to DeepSeek‑V3.2, Kimi‑K2 and near‑GPT‑5 High, including a 73.4% SWE‑Bench code‑agent score.
Xiaomi’s LLM‑Core team, led by Luo Fuli, announced the release of MiMo‑V2‑Flash, positioning it as the second step of their AGI roadmap and claiming performance on par with leading open‑source models DeepSeek‑V3.2 and Kimi‑K2.
Parameter size, “brain capacity”
MiMo‑V2‑Flash adopts a Mixture‑of‑Experts (MoE) architecture with a total of 309 billion parameters, of which only 15 billion are activated during inference, giving it the intelligence of a large model while retaining the speed of a smaller one.
Three engineering “magic” tricks
1. Hybrid Sliding‑Window Attention (Hybrid SWA) : The model splits attention into a 128‑token sliding window for local detail and a global attention stream for overall context, mixing them in a 5:1 ratio. This reduces KV‑cache memory usage by six times, allowing the same GPU to handle longer documents faster.
2. Multi‑Token Prediction (MTP) : Instead of generating a single token per step, MTP predicts several tokens simultaneously, increasing inference throughput by 2.6× and eliminating the data‑waiting bottleneck that typically slows reinforcement‑learning training.
3. Multi‑Teacher On‑Policy Distillation (MOPD) : The model learns from multiple expert teachers across domains (math, code, logic) while generating its own data on‑policy, achieving teacher‑level performance with only 1/50 of the compute required by traditional distillation.
Benchmark results vs. DeepSeek‑V3.2 and Kimi‑K2
The radar chart shows MiMo‑V2‑Flash excelling in Math, Code, and Long‑Context tasks, with mixed wins against DeepSeek‑V3.2 and Kimi‑K2 and even surpassing them on certain metrics. Notably, its code‑agent scores 73.4% on the SWE‑Bench benchmark, outperforming all open‑source models and approaching the level of GPT‑5 High.
MiMo‑V2‑Flash remains open‑source, with resources available at:
Model: http://hf.co/XiaomiMiMo/MiMo-V2-Flash Blog post: http://mimo.xiaomi.com/blog/mimo-v2-flash Technical report:
http://github.com/XiaomiMiMo/MiMo-V2-Flash/blob/main/paper.pdfAI Studio demo:
http://aistudio.xiaomimimo.comSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
