Testing MiniMax M3: Reproducing a Deep‑Learning Paper and Building an End‑to‑End Medical Image Segmentation App
The author evaluates MiniMax M3 by reproducing the 2021 TransUNet medical image segmentation paper, troubleshooting data and training issues, achieving near‑paper Dice scores, and then engineering a full‑stack React‑FastAPI application to demonstrate the model’s practical capabilities and cost.
MiniMax announced the M3 model, claiming the first open‑source model to combine three frontier capabilities: a coding agent, a 1‑million token context window, and native multimodal support. The new MSA (MiniMax Sparse Attention) architecture reduces per‑token computation to one‑twentieth of the previous generation, speeding pre‑fill ninefold and decoding fifteenfold.
To test these claims, the author chose to reproduce the award‑winning 2021 TransUNet paper, a hybrid CNN‑Transformer architecture for abdominal CT organ segmentation. The target dataset is Synapse (8‑organ CT), with the paper reporting an average Dice of 77.48% for TransUNet.
The reproduction workflow placed Claude Code locally for interaction while training ran on a remote server with two RTX 3090 GPUs. The original repository required upgrading from PyTorch 1.4 to 2.4 and fixing a missing __init__.py and a path‑concatenation bug. After these fixes, training proceeded in a tmux session.
Initial results (V1) were far below expectations: ViT‑None Dice 11.44%, R50‑ViT‑CUP 15.71%, TransUNet 13.70% versus the paper’s 61.5%, 71.29% and 77.48%. Although loss appeared to converge to 0.02, MiniMax flagged this as suspicious. The model then performed a diagnostic run on single slices, revealing correct Dice scores for liver (85.5%) and right kidney (73.8%). This indicated that the training was sound and the failure stemmed from the test data pipeline.
After identifying data type errors, missing normalization, and dimension swaps, MiniMax entered a V2/V3 refinement stage: it switched to the official preprocessing, added class‑weighting to the loss, reduced the base learning rate from 0.01 to 0.001, and added a 500‑step warm‑up. These changes restored performance, raising TransUNet Dice to 74.26% (within 3.22 points of the paper), R50‑ViT‑CUP to 66.46%, and even improving ViT‑None to 63.36% (above its reported 61.5%).
The entire reproduction took just over a day, dominated by remote training time. The author then leveraged MiniMax to build a complete full‑stack medical‑image segmentation demo: a React + TypeScript front‑end built with Vite and Cornerstone3D for loading CT volumes, windowing, slice navigation, and overlay visualization; a FastAPI back‑end exposing upload and inference endpoints; and PyTorch code loading the best TransUNet weights for segmentation.
The deployed web app allows users to upload a CT series and obtain instant segmentation results with a single click. Minor issues arose—initially the image orientation was reversed and the color overlay failed—but after two iterative refinements the app functioned as intended. Cost analysis showed the reproduction consumed ¥258 in API tokens and the deployment ¥127.
Overall, MiniMax M3 demonstrates solid long‑context encoding and agent collaboration, and its ability to diagnose, locate, and resolve a broken experiment provides convincing evidence of practical utility, even if it still trails top‑tier closed‑source models on the most complex tasks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
