Xiaomi’s New MiMo‑V2‑Flash LLM Rivals DeepSeek‑V3.2 and Near‑GPT‑5 High

Xiaomi’s MiMo‑V2‑Flash, a 309B‑parameter MoE LLM with only 15B active weights, uses Hybrid SWA, Multi‑Token Prediction and Multi‑Teacher On‑Policy Distillation to cut KV‑cache by six times, boost inference speed 2.6×, and achieve performance comparable to DeepSeek‑V3.2, Kimi‑K2 and near‑GPT‑5 High, including a 73.4% SWE‑Bench code‑agent score.

AI Insight Log
AI Insight Log
AI Insight Log
Xiaomi’s New MiMo‑V2‑Flash LLM Rivals DeepSeek‑V3.2 and Near‑GPT‑5 High

Xiaomi’s LLM‑Core team, led by Luo Fuli, announced the release of MiMo‑V2‑Flash, positioning it as the second step of their AGI roadmap and claiming performance on par with leading open‑source models DeepSeek‑V3.2 and Kimi‑K2.

Parameter size, “brain capacity”

MiMo‑V2‑Flash adopts a Mixture‑of‑Experts (MoE) architecture with a total of 309 billion parameters, of which only 15 billion are activated during inference, giving it the intelligence of a large model while retaining the speed of a smaller one.

Three engineering “magic” tricks

1. Hybrid Sliding‑Window Attention (Hybrid SWA) : The model splits attention into a 128‑token sliding window for local detail and a global attention stream for overall context, mixing them in a 5:1 ratio. This reduces KV‑cache memory usage by six times, allowing the same GPU to handle longer documents faster.

2. Multi‑Token Prediction (MTP) : Instead of generating a single token per step, MTP predicts several tokens simultaneously, increasing inference throughput by 2.6× and eliminating the data‑waiting bottleneck that typically slows reinforcement‑learning training.

3. Multi‑Teacher On‑Policy Distillation (MOPD) : The model learns from multiple expert teachers across domains (math, code, logic) while generating its own data on‑policy, achieving teacher‑level performance with only 1/50 of the compute required by traditional distillation.

Benchmark results vs. DeepSeek‑V3.2 and Kimi‑K2

The radar chart shows MiMo‑V2‑Flash excelling in Math, Code, and Long‑Context tasks, with mixed wins against DeepSeek‑V3.2 and Kimi‑K2 and even surpassing them on certain metrics. Notably, its code‑agent scores 73.4% on the SWE‑Bench benchmark, outperforming all open‑source models and approaching the level of GPT‑5 High.

MiMo‑V2‑Flash remains open‑source, with resources available at:

Model: http://hf.co/XiaomiMiMo/MiMo-V2-Flash Blog post: http://mimo.xiaomi.com/blog/mimo-v2-flash Technical report:

http://github.com/XiaomiMiMo/MiMo-V2-Flash/blob/main/paper.pdf

AI Studio demo:

http://aistudio.xiaomimimo.com
Xiaomi LLM负责人罗福莉官宣 MiMo-V2-Flash
Xiaomi LLM负责人罗福莉官宣 MiMo-V2-Flash
MiMo-V2-Flash 模型架构图
MiMo-V2-Flash 模型架构图
MOPD 效果对比图
MOPD 效果对比图
MiMo-V2-Flash 性能雷达图
MiMo-V2-Flash 性能雷达图
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large language modelMoEMTPXiaomiHybrid SWAMiMo-V2-FlashMOPD
AI Insight Log
Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.