MiniMax M2.5: 230B‑Parameter Model Activates 10B, Near Claude Sonnet for One‑Tenth the Cost

MiniMax’s new open‑source M2.5 model, built on a 230 billion‑parameter mixture‑of‑experts architecture that activates only 10 billion parameters per inference, delivers performance comparable to Claude Opus 4.6 across benchmarks, while costing roughly one‑tenth as much, and is already handling a large share of the company’s internal tasks.

AI Engineering
AI Engineering
AI Engineering
MiniMax M2.5: 230B‑Parameter Model Activates 10B, Near Claude Sonnet for One‑Tenth the Cost

MiniMax has released the next‑generation open‑source model M2.5, which the company describes as an "open‑frontier model designed for real‑world productivity".

Performance Data: Approaching Claude Opus 4.6

SWE‑Bench Verified 80.2% : on par with Claude Opus 4.6

BrowseComp 76.3% : industry‑leading search and tool‑use capability

Multi‑SWE‑Bench 51.3% : highest score in multilingual programming

BFCL Tool Calling 76.8% : high‑precision agent workflow

Speed increase 37% : end‑to‑end task completion time dramatically reduced

Sparse Computing Breakthrough

The efficiency of M2.5 comes from its mixture‑of‑experts (MoE) architecture. Although the model contains 230 billion parameters, only 10 billion are activated for each inference, preserving the depth of a large model while gaining the agility of a smaller one.

During training, MiniMax built a proprietary reinforcement‑learning framework called Forge. Engineer Olive Song disclosed on the ThursdAI podcast that Forge lets the AI practice programming and tool usage in thousands of simulated work environments, with a training cycle of two months.

To maintain training stability, the team applied a mathematical method called CISPO (Clipped Importance Sampling Policy Optimization). This prevents over‑correction during reinforcement learning and yields what the authors call "architect mindset" – proactive planning of project structure, functions, and interfaces before writing code.

Pricing Revolution: From Luxury to Commodity

MiniMax offers two versions:

M2.5‑Lightning : 100 tokens/second, $0.30 per million input tokens, $2.40 per million output tokens

Standard M2.5 : 50 tokens/second, $0.15 per million input tokens, $1.20 per million output tokens

According to the company’s calculations, a single task costs about $0.15, whereas Claude Opus 4.6 costs $3.00. An enterprise could run four AI "employees" continuously for a year for roughly $10,000.

Real‑World Applications: From Chatbots to AI Employees

M2.5 is already deployed at scale inside MiniMax. About 30% of the company’s tasks are completed by M2.5, and 80% of newly submitted code is generated by the model. It is optimized for enterprise office scenarios, capable of creating Word, Excel, and PowerPoint files, and scores 74.4% on financial‑modeling benchmarks.

In live tests, M2.5 successfully performed complex operations such as reviewing pull requests via the GitHub API, assigning code‑review tasks based on git blame, and fixing front‑end display issues. Minor issues were observed, including occasional pushes to the wrong branch and occasional omission of solution tags under specific commands.

Conclusion

Following Zhipu’s release of GLM‑5, MiniMax’s M2.5 demonstrates strong capability. With limited GPU resources, Chinese companies are narrowing the gap with top U.S. labs and enjoy a substantial cost advantage. Recent agent projects and built‑in programming tools suggest Chinese models are becoming a default choice.

Another trend is the shift of question‑answering applications toward result‑oriented autonomous long‑task agents, driven by improved model performance and reduced cost. As these two metrics continue to improve, more applications will move in this direction.

Digital employees and unmanned companies are on the horizon…

AI agentsMixture of ExpertsSWE-BenchClaude Opuscost-efficient LLMMiniMax M2.5
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.