Artificial Intelligence 14 min read

Nvidia’s Nemotron 3 Super Enters OpenClaw, Rivalling Opus 4.6

Nvidia unveiled the 120‑billion‑parameter Nemotron 3 Super, featuring a Mamba‑MoE hybrid architecture, LatentMoE routing, and Multi‑Token Prediction that together deliver up to 5× higher throughput and 3× faster inference, achieve 85.6% success on OpenClaw—matching Claude Opus 4.6 and GPT‑5.4—and set new records across Pinchbench, MMLU, SWE‑Bench, and other benchmarks, all while being fully open‑sourced with its training data and RL pipelines.

Machine Learning Algorithms & Natural Language Processing

Mar 12, 2026

Nvidia’s Nemotron 3 Super Enters OpenClaw, Rivalling Opus 4.6

Nvidia announced Nemotron 3 Super, a 120‑billion‑parameter open‑source language model that also includes 120 billion activation parameters and a 1‑million‑token context window. The model claims a three‑fold inference speed boost and a five‑fold increase in throughput compared with its predecessor.

Architecture Innovations

The core of Nemotron 3 Super is a Mamba‑MoE hybrid architecture. Eighty‑eight layers alternate between Mamba‑2 blocks, which provide linear‑time sequence modeling, and a small number of Transformer attention layers that act as global anchors for long‑range routing. A new LatentMoE design projects tokens from the full hidden dimension d into a much smaller latent dimension ℓ before routing and expert computation, reducing expert‑parameter loading and inter‑GPU communication by a factor of d/ℓ. This efficiency enables a proportional increase in the number of experts and activated experts, effectively improving accuracy without raising inference cost.

Nemotron 3 Super also introduces Multi‑Token Prediction (MTP). Instead of the traditional next‑token objective, MTP forces the model to predict several future tokens in a single forward pass, encouraging the model to learn multi‑step causal relationships. The additional prediction heads act as an internal “draft model”; during inference they generate candidate token sequences that the main model validates in one pass, dramatically lowering generation latency while adding negligible FLOPs.

Training Methodology

Pre‑training was performed on Nvidia’s Blackwell platform using native NVFP4 precision, which cuts memory usage while preserving zero accuracy loss. The dataset comprises over 25 trillion tokens, split into two stages: the first stage consumes 80 % of the data (≈20 trillion tokens) to ensure broad knowledge coverage across 16 domains, and the second stage consumes the remaining 20 % (≈5 trillion high‑quality tokens) with increased weighting for Wikipedia, high‑quality PDFs, and STEM reasoning data to boost accuracy.

Supervised fine‑tuning (SFT) used more than 7 million samples (≈800 billion tokens), with agent‑related tasks accounting for 36 % of the data—significantly higher than dialogue (23 %) and reasoning (31 %).

The reinforcement‑learning phase consists of four steps:

Multi‑environment RLVR: training on 21 environments and 37 datasets covering mathematics, code, STEM, safety, dialogue, instruction following, long‑context, puzzles, and various agent tasks, sampling 256 prompts per step with 16 responses each.

SWE‑RL: a software‑engineering‑focused RL stage that consumes 20 billion tokens, launching containers to run agents on real code repositories, generating patches, and validating them against actual test cases.

RLHF: 18 billion tokens used to train a Qwen‑3‑235B‑based reward model for identity‑aware and safety‑aware behavior alignment.

MTP recovery: freezing the backbone and fine‑tuning only the MTP heads to re‑align speculative decoding accuracy.

Benchmark Results

On the Pinchbench suite Nemotron 3 Super leads the open‑source leaderboard. In the OpenClaw task it achieves an 85.6 % success rate, directly comparable to Claude Opus 4.6 and GPT‑5.4. The model also tops the Artificial Analysis benchmark, surpassing all same‑scale open models in both efficiency and accuracy.

Compared with GPT‑OSS‑120B and Qwen‑3.5‑122B, Nemotron 3 Super delivers up to 5× higher throughput and up to 2× higher accuracy. For an 8k input / 64k output sequence, its throughput is 2.2× (vs GPT‑OSS‑120B) and 7.5× (vs Qwen‑3.5‑122B) faster.

Evaluation on standard language‑model suites shows strong results: MMLU 86.01, MMLU‑Pro 75.65, MATH 84.84; RULER@1M long‑context test 91.75 % (vs 22.3 % for GPT‑OSS‑120B); SWE‑Bench (OpenHands) 60.47 % (vs 41.9 % for GPT‑OSS‑120B); AIME‑25 math reasoning 90.21 % (nearly matching Qwen‑3.5‑122B’s 90.36 %).

Open‑Source Release and Ecosystem

All model weights, the full 25 trillion‑token dataset, the complete training pipeline, and 15 reinforcement‑learning environments are released on HuggingFace (https://huggingface.co/collections/nvidia/nvidia‑nemotron‑v3). The accompanying technical report (https://research.nvidia.com/labs/nemotron/files/NVIDIA‑Nemotron‑3‑Super‑Technical‑Report.pdf) details the architecture and training procedures.

In parallel, Nvidia is developing NemoClaw, an open‑source AI‑agent platform built on Nemotron 3 Super. NemoClaw bundles security and privacy tools for enterprise use, runs on any hardware (not limited to Nvidia GPUs), and aims to provide a turnkey “open‑source AI agent stack” for businesses.

Implications

The combination of a 1‑million‑token context, LatentMoE efficiency, and MTP‑driven speculative decoding addresses two major challenges of multi‑agent systems: context explosion and the “thinking tax” of repeatedly invoking LLMs for sub‑tasks. By keeping the entire workflow state in memory, Nemotron 3 Super enables end‑to‑end code generation, vulnerability fixing, and automated debugging without costly context re‑sending, and it can load thousands of pages of financial reports or entire codebases for seamless analysis.

Overall, Nemotron 3 Super represents a significant step toward high‑performance, open‑source, agent‑centric AI, and its open release is positioned to accelerate research and commercial adoption across AI‑driven software development, finance, life‑science literature mining, and other domains.

AI agents Speculative Decoding NVIDIA OpenClaw Nemotron-3-Super LatentMoE Mamba-MoE

Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.