Why I Dropped Opus 4.6 for MiniMax M2.5: Real‑World Cost and Performance Test
The author, a heavy user of AI agents for daily code refactoring, compares the expensive Opus 4.6 with the budget‑friendly MiniMax M2.5, showing how a mixed‑model strategy cuts costs dramatically while maintaining speed and quality across two full‑stack development case studies.
During the post‑New Year large‑model market surge, models such as MiniMax M2.5, GPT‑5.3‑Codex, GLM‑5, Opus 4.6 and Qwen 3.5‑Plus appeared. The author, a heavy user of AI agents for daily code refactoring, found Opus 4.6’s capability strong but its price “ridiculously high”.
To cut costs the author switched most tasks (80‑90%) to MiniMax M2.5, reserving Opus 4.6 for a few extremely difficult cases. MiniMax’s official benchmark shows 80.2 % SWE‑Bench Verified and a 37 % speed increase, reducing a 30‑minute refactor to 19 minutes.
Initial skepticism about domestic models was tested with two representative projects.
Case 1 – Adding a “mistake‑review” module to an AI interview platform
The requirement involved database schema changes, RESTful API design, front‑end state management and routing. MiniMax first entered analysis mode, produced a detailed plan covering requirement breakdown, data‑model extension, API design, file mapping and verification steps.
During execution the model chose “Option 2: auto‑accept edits”, later suggesting “Option 3: manually approve edits” for complex changes. It correctly generated backend code using Spring Boot + JPA, applied @Transactional, followed anemic domain‑model principles, and produced front‑end code for React + TypeScript that reused existing components. When a MapStruct type‑mismatch error occurred, the model diagnosed and fixed it automatically.
After completing all tasks the model listed the modifications, confirmed that the backend and frontend started successfully, and provided instructions for manual verification.
Case 2 – Building a web‑based task board
The author asked for a full‑stack task board with draggable columns. MiniMax asked for the preferred stack, the author chose Vue 3 + Vite for the front‑end and Spring Boot + SQLite for the back‑end. The model produced a comprehensive implementation plan, then again used “Option 2: auto‑accept edits”. The entire front‑end and back‑end were generated in about 15 minutes, and the resulting SQLite database contained the expected task data.
Key advantages of MiniMax M2.5
Two pricing tiers: a “fast” version (100 TPS) costing $0.3 per M token input and $2.4 per M token output, and an “economy” version (50 TPS) with half the output price.
At 100 TPS the model costs roughly 1 USD per hour; at 50 TPS only $0.3 per hour, making it 1/10–1/20 the price of Opus, Gemini 3 Pro or GPT‑5.
One‑year continuous operation of four agents would cost about $10 k.
Native Spec behavior
Before writing code the model acts like an architect, dissecting requirements, designing UI and data structures, and outputting a full technical plan (Spec). This “Spec coding” stage is contrasted with “Vibe coding”, which is suited for quick prototyping.
The Spec workflow consists of four pipeline steps: Specify (product definition), Plan (technical design), Tasks (task breakdown with acceptance criteria) and Implement (AI executes based on the generated documents).
Native Agent architecture
Agent RL framework that decouples inference server from training engine, ensuring token consistency.
Uses CISPO algorithm and process‑reward design for stable MoE training.
Engineering optimizations such as Windowed FIFO scheduling and tree‑merged training samples achieve ~40× training acceleration.
Private deployment
With only 10 B active parameters, M2.5 is the smallest flagship model in its tier, offering low GPU memory requirements and high inference efficiency, and the vendor has confirmed plans to open‑source the model.
Installation and configuration
Install Claude Code (Node.js 18+): npm install -g @anthropic-ai/claude-code Configure the API key and model in ~/.claude/settings.json (or the Windows equivalent):
{
"env": {
"ANTHROPIC_BASE_URL": "https://api.minimaxi.com/anthropic",
"ANTHROPIC_AUTH_TOKEN": "MINIMAX_API_KEY",
"API_TIMEOUT_MS": "3000000",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": 1,
"ANTHROPIC_MODEL": "MiniMax-M2.5",
"ANTHROPIC_SMALL_FAST_MODEL": "MiniMax-M2.5",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "MiniMax-M2.5",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "MiniMax-M2.5",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "MiniMax-M2.5"
}
}Add { "hasCompletedOnboarding": true } to ~/.claude.json (or the Windows equivalent) to complete onboarding.
Run claude in any directory, trust the folder, and the model will use MiniMax‑M2.5 as the underlying LLM.
Conclusion
The author finds MiniMax M2.5 behaves like an experienced architect, providing “plan‑first” capabilities that make running complex agents economically feasible. Five practical recommendations are offered: avoid over‑reliance on flagship models, let MiniMax handle most tasks, use its Spec planning for complex work, consider private deployment for 10 B models, and try the Coding Plan program (link omitted).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JavaGuide
Backend tech guide and AI engineering practice covering fundamentals, databases, distributed systems, high concurrency, system design, plus AI agents and large-model engineering.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
