Artificial Intelligence 7 min read

Xiaomi’s New MiMo‑V2‑Flash LLM Rivals DeepSeek‑V3.2 and Near‑GPT‑5 High

Xiaomi’s MiMo‑V2‑Flash, a 309B‑parameter MoE LLM with only 15B active weights, uses Hybrid SWA, Multi‑Token Prediction and Multi‑Teacher On‑Policy Distillation to cut KV‑cache by six times, boost inference speed 2.6×, and achieve performance comparable to DeepSeek‑V3.2, Kimi‑K2 and near‑GPT‑5 High, including a 73.4% SWE‑Bench code‑agent score.

AI Insight Log

Dec 18, 2025

Xiaomi’s New MiMo‑V2‑Flash LLM Rivals DeepSeek‑V3.2 and Near‑GPT‑5 High

Xiaomi’s LLM‑Core team, led by Luo Fuli, announced the release of MiMo‑V2‑Flash, positioning it as the second step of their AGI roadmap and claiming performance on par with leading open‑source models DeepSeek‑V3.2 and Kimi‑K2.

Parameter size, “brain capacity”

MiMo‑V2‑Flash adopts a Mixture‑of‑Experts (MoE) architecture with a total of 309 billion parameters, of which only 15 billion are activated during inference, giving it the intelligence of a large model while retaining the speed of a smaller one.

Three engineering “magic” tricks

1. Hybrid Sliding‑Window Attention (Hybrid SWA) : The model splits attention into a 128‑token sliding window for local detail and a global attention stream for overall context, mixing them in a 5:1 ratio. This reduces KV‑cache memory usage by six times, allowing the same GPU to handle longer documents faster.

2. Multi‑Token Prediction (MTP) : Instead of generating a single token per step, MTP predicts several tokens simultaneously, increasing inference throughput by 2.6× and eliminating the data‑waiting bottleneck that typically slows reinforcement‑learning training.

3. Multi‑Teacher On‑Policy Distillation (MOPD) : The model learns from multiple expert teachers across domains (math, code, logic) while generating its own data on‑policy, achieving teacher‑level performance with only 1/50 of the compute required by traditional distillation.

Benchmark results vs. DeepSeek‑V3.2 and Kimi‑K2

The radar chart shows MiMo‑V2‑Flash excelling in Math, Code, and Long‑Context tasks, with mixed wins against DeepSeek‑V3.2 and Kimi‑K2 and even surpassing them on certain metrics. Notably, its code‑agent scores 73.4% on the SWE‑Bench benchmark, outperforming all open‑source models and approaching the level of GPT‑5 High.

MiMo‑V2‑Flash remains open‑source, with resources available at:

Model: http://hf.co/XiaomiMiMo/MiMo-V2-Flash Blog post: http://mimo.xiaomi.com/blog/mimo-v2-flash Technical report:

http://github.com/XiaomiMiMo/MiMo-V2-Flash/blob/main/paper.pdf

AI Studio demo:

http://aistudio.xiaomimimo.com

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Model MoE MTP Xiaomi Hybrid SWA MiMo-V2-Flash MOPD

Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.