Inside Xiaomi’s MiMo‑V2‑Flash: How a Hybrid SWA Design Powers Fast, Efficient AI Reasoning
Xiaomi’s newly open‑sourced MiMo‑V2‑Flash model combines a hybrid sliding‑window/attention architecture with a 309B‑parameter MoE design, delivering top‑tier reasoning, coding and agent performance while introducing the efficient MOPD post‑training paradigm that dramatically reduces RL compute costs.
Xiaomi recently released and open‑sourced MiMo‑V2‑Flash, the first model presented by Luo Fuli after taking charge of Xiaomi’s large‑model efforts.
The model adopts a hybrid SWA (Sliding‑Window + Attention) architecture, which the team describes as simple and elegant.
Reasoning, Programming and Agents
MiMo‑V2‑Flash is a 309‑billion‑parameter model with only 15 billion active parameters, built as a mixture‑of‑experts (MoE) system that interleaves sliding‑window and full‑attention layers, using a 128‑token window and a 5:1 mixing ratio.
Key capabilities highlighted:
Ranks among the top two open‑source models on the AIME 2025 math competition and the GPQA‑Diamond scientific benchmark, demonstrating strong reasoning ability.
Achieves first place among open‑source models on the SWE‑bench Verified software‑engineering benchmark and multilingual leaderboards, comparable to leading closed‑source models.
Designed specifically for reasoning, coding and agent scenarios, offering a “think” mode and a “instant answer” mode that users can switch with a single click.
Can generate ready‑to‑use HTML pages and works seamlessly with coding scaffolds such as Claude Code, Cursor and Cline; provides a 256 k context window for hundreds of interaction rounds and tool calls.
MOPD: A New Post‑Training Paradigm
To efficiently scale reinforcement‑learning (RL) computation and boost reasoning and agent abilities, the authors propose Multi‑Teacher Online Policy Distillation (MOPD). The method first obtains domain‑specific expert teachers via SFT/RL, then lets the student model sample its own policy (rollouts) and optimizes it using dense token‑level rewards from multiple teachers.
MOPD is highly stable and efficient, requiring less than 1/50 of the compute of a traditional SFT + RL pipeline to reach the peak performance of the teacher models.
The framework’s decoupled design allows easy integration of new teachers and ORM, forming a self‑improving loop where distilled students can become stronger teachers.
https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash
https://mimo.xiaomi.com/zh/blog/mimo-v2-flashSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
