MiniMax M3: How a 1M‑Token, Multimodal Agent Reproduces ICLR Research and Automates Kaggle Competitions
The MiniMax M3 model combines a 1‑million‑token context window, native multimodal training and a new MiniMax Sparse Attention architecture that cuts token compute to one‑twentieth of its predecessor, achieving up to 15× faster decoding, while its interactive user‑simulator training enables fully autonomous agents that can reproduce ICLR‑2025 research and tackle Auto‑Kaggle competitions at a fraction of the cost of Western models.
