NVIDIA Nemotron 3 Super: 7× Faster Than Qwen3.5 – Inside Hybrid Mamba‑Attention, LatentMoE, and MTP
NVIDIA’s Nemotron 3 Super, a 120.6 B‑parameter flagship model supporting 1 M‑token context, combines Hybrid Mamba‑Attention, LatentMoE, and Multi‑Token Prediction to achieve up to 7.5× higher inference throughput than Qwen3.5 while matching or surpassing its accuracy across a range of benchmarks.
