MiniMax M3 Sets New Benchmarks: 1M Context, 59% SWE‑Bench, 9‑15× Faster Multimodal Model
MiniMax unveiled its open‑source M3 model, delivering 1 million‑token context, 59 % SWE‑Bench Pro accuracy that outperforms GPT‑5.5 and Gemini 3.1 Pro, native multimodal desktop interaction, and a 9‑15× speed boost via MiniMax Sparse Attention, with pricing as low as $20 per month.
On June 1, 2026 MiniMax released M3, the first domestic open‑source model that simultaneously offers 1 M token context, top‑tier coding ability (59 % SWE‑Bench Pro), and native multimodal support for images, video, and desktop interaction.
Benchmark results show M3 surpasses GPT‑5.5 and Gemini 3.1 Pro on several tasks: SWE‑Bench Pro 59.0 % (close to Opus 4.7), Terminal‑Bench 2.1 at 66.0 %, SWE‑efficiency 34.8 %, KernelBench Hard 28.8 % on NVIDIA Blackwell GPUs, MCP Atlas 74.2 %, Claw‑Eval first place on 161 tasks, and SVG‑Bench beating Opus 4.7. In multimodal evaluation, OmniDocBench exceeds Gemini 3.1 Pro and OSWorld‑Verified reaches 70.06 % success on 361 samples, demonstrating the ability to operate desktop workflows.
VentureBeat noted that M3’s performance matches GPT‑5.5 and Gemini 3.1 Pro while costing only 5‑10 % of their price.
MSA: Accelerating 1 M‑token Context
Traditional full attention has O(n²) complexity; MiniMax introduces MiniMax Sparse Attention (MSA) that partitions the KV cache into blocks, keeping KV in the outer loop and queries in the inner loop, allowing each KV block to be read once with contiguous memory access. This operator‑level optimization makes M3’s heads over 4× faster than Flash‑Sparse‑Attention and flash‑moba.
1 M token prefill speed: >9× acceleration
Decode speed: >15× acceleration
Multi‑task ablation: comparable to full attention
1 M token token‑level compute: 1/20 of M2’s
Native Multimodal Training from Day 0
Instead of training text first and adding images later, M3 mixes text, image, and video from the first training step. The team rebuilt the data pipeline to reach 1 trillion tokens (100 万亿 tokens) and supports inputs of images, video, and desktop screen actions.
Three Real‑World Continuous‑Operation Tasks
Task 1: Reproducing the ICLR 2025 award‑winning paper “Learning Dynamics of LLM Finetuning”. M3 completed the experiment autonomously in ~12 hours, generating 18 Git commits, 23 experiment figures, and required no human intervention.
Task 2: Optimizing a Hopper FP8 GEMM kernel without a reference implementation. Over ~24 hours, M3 performed 147 benchmark submissions, 1 959 tool calls, and six major optimization rounds, raising FP8 utilization from 7.6 % to 71.3 % (9.4× speedup), with the best result appearing at submission 145.
Task 3: Fully automating the training pipeline of four base models across AIME2025, BFCL, GPQA, GSM8K, and HumanEval. M3 ran for 12 hours without human input, achieving a score of 0.37 versus Opus 4.7’s 0.42 and GPT‑5.5’s 0.39, beating all other competitors.
Pricing and API
Plus: $20/month for ~1.7 billion tokens
Max: $50/month for ~5.1 billion tokens
Ultra: $120/month for ~9.8 billion tokens
All modalities share a single token pool, and the “Thinking” mode can be toggled per request at the same price.
MiniMax Code: An Agent Product Powered by M3
MiniMax also launched MiniMax Code, an agent platform that runs multiple agents concurrently, uses a producer‑verifier loop for self‑correction, and leverages M3’s multimodal capabilities for cross‑application desktop automation. Example: a user asks to open a local ERP system and batch‑enter invoice data from an Excel file; MiniMax Code executes the entire workflow on the desktop.
Why It Matters
First domestic open‑source model to combine top‑tier coding, 1 M context, and native multimodality.
MSA architecture offers a cost‑effective path for long‑context models, likely to be adopted by the community.
Pricing establishes a new low‑cost baseline for SOTA models, making AI‑driven productivity tools affordable for individuals and small teams.
Key information: Release date June 1 2026; API at platform.minimax.io; weights and technical report open‑sourced within 10 days; core architecture MSA; 1 M token context; multimodal inputs (text, image, video, desktop); benchmark highlights SWE‑Bench Pro 59 % and OSWorld 70.06 %; official blog minimaxi.com/blog/minimax-m3.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
