MiniMax-M1 Revealed: Hybrid Attention, RL Training, and 1M Token Context

MiniMax’s latest M1 model, unveiled after a $300 million funding round, showcases a 4.56‑trillion‑parameter hybrid‑expert architecture with lightning attention, supporting up to one million tokens, and leverages reinforcement‑learning techniques to enhance long‑context handling, inference efficiency, and system‑2 reasoning capabilities.

DataFunTalk
DataFunTalk
DataFunTalk
MiniMax-M1 Revealed: Hybrid Attention, RL Training, and 1M Token Context

MiniMax Funding and Product Launch

MiniMax raised nearly $300 million in a new financing round, valuing the company at over $40 billion, and announced a series of products including the inference model MiniMax‑M1, video model Sea Shell 02, and the MiniMax Agent.

Technical Highlights from the M1 Closed‑Door Meeting

The M1 model is a 4.56 trillion‑parameter open‑source hybrid‑expert LLM that supports up to 1 million tokens of context and 80 k token inference output, matching the input length of closed‑source models such as Google Gemini 2.5 Pro.

Key innovations discussed:

Reinforcement‑learning (RL) training can endow limited‑context models with new capabilities and broaden their knowledge scope.

RL focused solely on mathematics and code tends to increase hallucinations.

Latent reasoning and visual‑latent reasoning are explored as ways for models to “think” with images.

Reward modeling, multi‑agent systems, and AI‑automation are identified as major RL challenges.

Hybrid linear attention dramatically reduces FLOPs per token, enabling efficient long‑context processing, though it may require more tokens to match full‑attention performance.

Long‑context handling is crucial for agents, allowing them to ingest entire codebases, API docs, and interaction histories in a single pass.

Hybrid architectures are projected to become mainstream, but current bottlenecks lie in infrastructure and GPU utilization.

System‑2 reasoning and self‑reflection emerge when models are given ample computation budget.

Performance and Cost

MiniMax‑M1 outperforms open‑source rivals such as DeepSeek‑R1 and Qwen3‑235B on complex software‑engineering and long‑context benchmarks. The RL training phase used 512 H800 GPUs for three weeks at a cost of $537,400, an order of magnitude cheaper than expected.

Implications for the LLM Industry

The advances demonstrate that efficient hybrid attention combined with targeted RL can achieve performance comparable to full‑attention models while keeping inference costs low, a direction likely to shape future large‑model research and deployment.

large language modelslong contextreinforcement learningmodel architectureAI scalingHybrid attention
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.