Artificial Intelligence 11 min read

MiniMax M3 Sets New Benchmarks: 1M Context, 59% SWE‑Bench, 9‑15× Faster Multimodal Model

MiniMax unveiled its open‑source M3 model, delivering 1 million‑token context, 59 % SWE‑Bench Pro accuracy that outperforms GPT‑5.5 and Gemini 3.1 Pro, native multimodal desktop interaction, and a 9‑15× speed boost via MiniMax Sparse Attention, with pricing as low as $20 per month.

Code Mala Tang

Jun 6, 2026

MiniMax M3 Sets New Benchmarks: 1M Context, 59% SWE‑Bench, 9‑15× Faster Multimodal Model

On June 1, 2026 MiniMax released M3, the first domestic open‑source model that simultaneously offers 1 M token context, top‑tier coding ability (59 % SWE‑Bench Pro), and native multimodal support for images, video, and desktop interaction.

Benchmark results show M3 surpasses GPT‑5.5 and Gemini 3.1 Pro on several tasks: SWE‑Bench Pro 59.0 % (close to Opus 4.7), Terminal‑Bench 2.1 at 66.0 %, SWE‑efficiency 34.8 %, KernelBench Hard 28.8 % on NVIDIA Blackwell GPUs, MCP Atlas 74.2 %, Claw‑Eval first place on 161 tasks, and SVG‑Bench beating Opus 4.7. In multimodal evaluation, OmniDocBench exceeds Gemini 3.1 Pro and OSWorld‑Verified reaches 70.06 % success on 361 samples, demonstrating the ability to operate desktop workflows.

VentureBeat noted that M3’s performance matches GPT‑5.5 and Gemini 3.1 Pro while costing only 5‑10 % of their price.

MSA: Accelerating 1 M‑token Context

Traditional full attention has O(n²) complexity; MiniMax introduces MiniMax Sparse Attention (MSA) that partitions the KV cache into blocks, keeping KV in the outer loop and queries in the inner loop, allowing each KV block to be read once with contiguous memory access. This operator‑level optimization makes M3’s heads over 4× faster than Flash‑Sparse‑Attention and flash‑moba.

1 M token prefill speed: >9× acceleration

Decode speed: >15× acceleration

Multi‑task ablation: comparable to full attention

1 M token token‑level compute: 1/20 of M2’s

Native Multimodal Training from Day 0

Instead of training text first and adding images later, M3 mixes text, image, and video from the first training step. The team rebuilt the data pipeline to reach 1 trillion tokens (100 万亿 tokens) and supports inputs of images, video, and desktop screen actions.

Three Real‑World Continuous‑Operation Tasks

Task 1: Reproducing the ICLR 2025 award‑winning paper “Learning Dynamics of LLM Finetuning”. M3 completed the experiment autonomously in ~12 hours, generating 18 Git commits, 23 experiment figures, and required no human intervention.

Task 2: Optimizing a Hopper FP8 GEMM kernel without a reference implementation. Over ~24 hours, M3 performed 147 benchmark submissions, 1 959 tool calls, and six major optimization rounds, raising FP8 utilization from 7.6 % to 71.3 % (9.4× speedup), with the best result appearing at submission 145.

Task 3: Fully automating the training pipeline of four base models across AIME2025, BFCL, GPQA, GSM8K, and HumanEval. M3 ran for 12 hours without human input, achieving a score of 0.37 versus Opus 4.7’s 0.42 and GPT‑5.5’s 0.39, beating all other competitors.

Pricing and API

Plus: $20/month for ~1.7 billion tokens

Max: $50/month for ~5.1 billion tokens

Ultra: $120/month for ~9.8 billion tokens

All modalities share a single token pool, and the “Thinking” mode can be toggled per request at the same price.

MiniMax Code: An Agent Product Powered by M3

MiniMax also launched MiniMax Code, an agent platform that runs multiple agents concurrently, uses a producer‑verifier loop for self‑correction, and leverages M3’s multimodal capabilities for cross‑application desktop automation. Example: a user asks to open a local ERP system and batch‑enter invoice data from an Excel file; MiniMax Code executes the entire workflow on the desktop.

Why It Matters

First domestic open‑source model to combine top‑tier coding, 1 M context, and native multimodality.

MSA architecture offers a cost‑effective path for long‑context models, likely to be adopted by the community.

Pricing establishes a new low‑cost baseline for SOTA models, making AI‑driven productivity tools affordable for individuals and small teams.

Key information: Release date June 1 2026; API at platform.minimax.io; weights and technical report open‑sourced within 10 days; core architecture MSA; 1 M token context; multimodal inputs (text, image, video, desktop); benchmark highlights SWE‑Bench Pro 59 % and OSWorld 70.06 %; official blog minimaxi.com/blog/minimax-m3.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

open-source large language model benchmark multimodal SWE-bench MSA MiniMax M3

Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

MSA: Accelerating 1 M‑token Context

Native Multimodal Training from Day 0

Three Real‑World Continuous‑Operation Tasks

Pricing and API

MiniMax Code: An Agent Product Powered by M3

Why It Matters

Code Mala Tang

How this landed with the community

Was this worth your time?

0 Comments

MSA: Accelerating 1 M‑token Context

Native Multimodal Training from Day 0