Xiaomi’s MiMo‑V2.5: Halving Cost, Doubling Efficiency with a New Multimodal LLM
Xiaomi unveiled the MiMo‑V2.5 and MiMo‑V2.5‑Pro large language models, highlighting up to 50% lower API cost, multimodal perception, token‑efficiency gains, benchmark superiority over Claude Opus 4.6 and GPT‑5.4, and real‑world demos that built a full compiler in 4.3 hours and a video‑editing web app in 11.5 hours.
Xiaomi announced the MiMo‑V2.5 series of large language models, comprising the standard MiMo‑V2.5 and the higher‑capacity MiMo‑V2.5‑Pro, and confirmed that both models will be released as open‑source globally.
Long‑Task Capability
MiMo‑V2.5‑Pro is described as Xiaomi’s most powerful model to date, matching top‑tier models such as Claude Opus 4.6 and GPT‑5.4 in general‑agent ability, complex software‑engineering tasks, and long‑running tasks.
In a benchmark from Peking University’s "Compiler Principles" course, students normally need several weeks to implement a full SysY compiler in Rust. MiMo‑V2.5‑Pro completed the same project in 4.3 hours, invoking tools 672 times and achieving a perfect 233‑point score on the hidden test set.
The model first constructed a complete pipeline skeleton, then tackled core modules layer by layer. It earned full marks on Koopa IR code generation (110 points), RISC‑V backend generation (103 points), and performance optimisation (20 points). The initial compilation success rate reached 59 %, and when a regression occurred at iteration 512, the model automatically diagnosed and recovered the code.
Given a simple prompt “build a video‑editing web application”, MiMo‑V2.5‑Pro worked autonomously for 11.5 hours, performed 1 868 tool calls, and produced a fully functional web app consisting of 8 192 lines of code, including multi‑track timelines, clip trimming, cross‑fade, audio mixing, and export functionality.
Native Multimodal Ability and Efficiency Leap
The base MiMo‑V2.5 model is a native multimodal LLM that can see, hear, and read, converting perception directly into actions. In the Claw‑Eval benchmark, its agent capability surpasses the previous MiMo‑V2‑Pro while reducing API‑call cost by roughly 50 %.
Across multimodal benchmarks such as VideoMME, CharXiv, and MMMU‑Pro, the series demonstrates performance that approaches or exceeds leading closed‑source models.
Both versions feature deep token‑efficiency optimisations. At equal ClawEval scores, MiMo‑V2.5‑Pro uses 42 % fewer tokens than Kimi K2.6, and the base MiMo‑V2.5 saves about 50 % compared with Muse Spark. Each model supports roughly one‑million token context length; the Pro version focuses on long‑cycle, high‑difficulty agent tasks, while the base version covers the majority of general scenarios.
All MiMo‑V2.5 models will be open‑sourced worldwide.
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
