MiniMax M2.5: 230B‑Parameter Model Activates 10B, Near Claude Sonnet for One‑Tenth the Cost
MiniMax’s new open‑source M2.5 model, built on a 230 billion‑parameter mixture‑of‑experts architecture that activates only 10 billion parameters per inference, delivers performance comparable to Claude Opus 4.6 across benchmarks, while costing roughly one‑tenth as much, and is already handling a large share of the company’s internal tasks.
MiniMax has released the next‑generation open‑source model M2.5, which the company describes as an "open‑frontier model designed for real‑world productivity".
Performance Data: Approaching Claude Opus 4.6
SWE‑Bench Verified 80.2% : on par with Claude Opus 4.6
BrowseComp 76.3% : industry‑leading search and tool‑use capability
Multi‑SWE‑Bench 51.3% : highest score in multilingual programming
BFCL Tool Calling 76.8% : high‑precision agent workflow
Speed increase 37% : end‑to‑end task completion time dramatically reduced
Sparse Computing Breakthrough
The efficiency of M2.5 comes from its mixture‑of‑experts (MoE) architecture. Although the model contains 230 billion parameters, only 10 billion are activated for each inference, preserving the depth of a large model while gaining the agility of a smaller one.
During training, MiniMax built a proprietary reinforcement‑learning framework called Forge. Engineer Olive Song disclosed on the ThursdAI podcast that Forge lets the AI practice programming and tool usage in thousands of simulated work environments, with a training cycle of two months.
To maintain training stability, the team applied a mathematical method called CISPO (Clipped Importance Sampling Policy Optimization). This prevents over‑correction during reinforcement learning and yields what the authors call "architect mindset" – proactive planning of project structure, functions, and interfaces before writing code.
Pricing Revolution: From Luxury to Commodity
MiniMax offers two versions:
M2.5‑Lightning : 100 tokens/second, $0.30 per million input tokens, $2.40 per million output tokens
Standard M2.5 : 50 tokens/second, $0.15 per million input tokens, $1.20 per million output tokens
According to the company’s calculations, a single task costs about $0.15, whereas Claude Opus 4.6 costs $3.00. An enterprise could run four AI "employees" continuously for a year for roughly $10,000.
Real‑World Applications: From Chatbots to AI Employees
M2.5 is already deployed at scale inside MiniMax. About 30% of the company’s tasks are completed by M2.5, and 80% of newly submitted code is generated by the model. It is optimized for enterprise office scenarios, capable of creating Word, Excel, and PowerPoint files, and scores 74.4% on financial‑modeling benchmarks.
In live tests, M2.5 successfully performed complex operations such as reviewing pull requests via the GitHub API, assigning code‑review tasks based on git blame, and fixing front‑end display issues. Minor issues were observed, including occasional pushes to the wrong branch and occasional omission of solution tags under specific commands.
Conclusion
Following Zhipu’s release of GLM‑5, MiniMax’s M2.5 demonstrates strong capability. With limited GPU resources, Chinese companies are narrowing the gap with top U.S. labs and enjoy a substantial cost advantage. Recent agent projects and built‑in programming tools suggest Chinese models are becoming a default choice.
Another trend is the shift of question‑answering applications toward result‑oriented autonomous long‑task agents, driven by improved model performance and reduced cost. As these two metrics continue to improve, more applications will move in this direction.
Digital employees and unmanned companies are on the horizon…
AI Engineering
Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
