Open-Source MiniMax M2.5 Hits New Year Eve: Top Coding Scores and Ultra‑Low Cost

The MiniMax M2.5 model, released open‑source on Feb 13, achieves an 80.2% SWE‑Bench Verified score that surpasses GPT‑5.2, Claude Opus 4.6 and Google Gemini 3 Pro, runs 37% faster than its predecessor, costs only $1 per hour, and demonstrates SOTA agent abilities in browsing and tool use, marking a major leap for Chinese large‑language models.

AI Insight Log
AI Insight Log
AI Insight Log
Open-Source MiniMax M2.5 Hits New Year Eve: Top Coding Scores and Ultra‑Low Cost

On February 13, two days before the Chinese New Year, the Chinese AI startup released MiniMax M2.5 as an open‑source large‑language model, following the earlier open‑source of GLM‑5 and the launch of Doubao Seed 2.0 Pro.

1. Coding Ability Tops the Leaderboard

MiniMax M2.5 scores 80.2% on the industry‑standard SWE‑Bench Verified benchmark, outperforming GPT‑5.2, Claude Opus 4.6 and Google Gemini 3 Pro, placing it in the world‑first tier for coding tasks. The model also performs strongly on Multi‑SWE‑Bench and VIBE‑Pro (UI/UX interaction) tests.

The developers explain that M2.5 adopts an “architect mindset”: before writing code it first decomposes requirements, creates a technical specification, and then implements the solution. This spec‑writing capability markedly raises success rates on complex projects.

2. Speed and Cost: Powerful Yet Cheap

Compared with the previous M2.1 generation, M2.5’s inference speed improves by 37%, reducing the average SWE‑Bench task time from 31.3 minutes to 22.8 minutes, roughly matching Claude Opus 4.6.

The official slogan is “Intelligence too cheap to meter”. The model delivers a throughput of 100 tokens / s and costs only $1 per hour of continuous operation, making a full‑time AI engineer virtually cost‑free for developers and enterprises.

100 tokens/s ultra‑high throughput.

$1 / hour runtime cost.

3. Agent Capabilities: More Like a Working Employee

In the BrowseComp (web browsing & search) and BFCL (tool calling) benchmarks, M2.5 reaches state‑of‑the‑art performance. It is no longer just a chatbot; it can act as an employee that performs concrete work.

Search & Research : conducts deep web searches and cross‑validation like a human expert.

Office Automation : handles Word, PowerPoint, Excel and even complex financial modeling.

Internal evaluations show that on real‑world Office Work tasks, M2.5’s efficiency and accuracy far exceed competing models.

4. Slope and Future Outlook

Andrej Karpathy, former OpenAI member and Tesla AI director, famously said, “The majority of the ruff ruff is people who look at the current point and people who look at the current slope.” MiniMax M2.5’s rapid improvement exemplifies the “slope” argument, turning from a follower into a leader within a few months.

The article concludes that with GLM‑5, Doubao Seed 2.0 Pro and now MiniMax M2.5 delivering top‑tier coding performance and near‑zero operating cost, the era is especially favorable for developers and users. The next challenge now shifts to DeepSeek, whose upcoming release is eagerly anticipated.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI codingAgentlow costMinimaxSWE-benchM2.5
AI Insight Log
Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.