Tagged articles
2 articles
Page 1 of 1
PaperAgent
PaperAgent
Dec 19, 2025 · Artificial Intelligence

Inside Xiaomi’s MiMo‑V2‑Flash: How a Hybrid SWA Design Powers Fast, Efficient AI Reasoning

Xiaomi’s newly open‑sourced MiMo‑V2‑Flash model combines a hybrid sliding‑window/attention architecture with a 309B‑parameter MoE design, delivering top‑tier reasoning, coding and agent performance while introducing the efficient MOPD post‑training paradigm that dramatically reduces RL compute costs.

Hybrid SWAMOPDMiMo-V2-Flash
0 likes · 5 min read
Inside Xiaomi’s MiMo‑V2‑Flash: How a Hybrid SWA Design Powers Fast, Efficient AI Reasoning
AI Insight Log
AI Insight Log
Dec 18, 2025 · Artificial Intelligence

Xiaomi’s New MiMo‑V2‑Flash LLM Rivals DeepSeek‑V3.2 and Near‑GPT‑5 High

Xiaomi’s MiMo‑V2‑Flash, a 309B‑parameter MoE LLM with only 15B active weights, uses Hybrid SWA, Multi‑Token Prediction and Multi‑Teacher On‑Policy Distillation to cut KV‑cache by six times, boost inference speed 2.6×, and achieve performance comparable to DeepSeek‑V3.2, Kimi‑K2 and near‑GPT‑5 High, including a 73.4% SWE‑Bench code‑agent score.

Hybrid SWAMOPDMTP
0 likes · 7 min read
Xiaomi’s New MiMo‑V2‑Flash LLM Rivals DeepSeek‑V3.2 and Near‑GPT‑5 High