How OpenAI’s MRC Protocol Redesigns Communication for 100,000‑GPU Clusters
OpenAI, together with AMD, Broadcom, Intel, Microsoft and Nvidia, introduced the Multipath Reliable Connection (MRC) protocol, which splits a single 800 Gb/s link into eight 100 Gb/s planes, enabling full‑mesh connectivity for over 100 k GPUs with fewer switches, lower cost, higher resilience, and dynamic load‑balancing that eliminates congestion and hardware‑failure impacts during large‑scale AI training.
OpenAI, in collaboration with AMD, Broadcom, Intel, Microsoft and Nvidia, has released the Multipath Reliable Connection (MRC) supercomputer networking protocol. Built on the RoCE (RDMA over Converged Ethernet) standard and incorporating technologies from the Ultra Ethernet Consortium, MRC targets the extreme bandwidth and reliability demands of modern AI model training.
Breaking the Massive Single Lane
Training frontier AI models generates millions of data transfers per compute step. Even a single delayed packet can stall the entire training job, and as cluster size grows, network congestion and hardware failures become more frequent. Traditional designs treat each 800 Gb/s network interface as a single ultra‑wide lane; a faulty cable can collapse the whole training run, forcing engineers to restart from checkpoints or pause the network for seconds.
To support "Stargate"‑class supercomputers, OpenAI’s engineering team spent two years redesigning the network architecture. MRC replaces the single lane with eight independent 100 Gb/s planes per interface, allowing a switch that previously connected 64 × 800 Gb/s ports to now connect 512 × 100 Gb/s ports. This reduces the required switch hierarchy from three‑four layers to just two when interconnecting roughly 130 000 GPUs.
Spraying Data Across Hundreds of Paths
MRC’s multipath network provides a rich set of paths, but traditional AI training protocols force each packet along a fixed route to preserve order. In a large‑scale multipath topology, this rigid rule leads to hotspots and under‑utilized planes, so simply adding physical links does not improve performance.
Instead, MRC fragments each transmission task and sprays data packets across hundreds of paths and all planes, similar to a watering can. Packets arrive out of order, each carrying its final memory address. The receiver immediately places each packet at its destination address, reconstructing the data without reordering.
This approach eliminates local congestion hotspots, equalizes processing times across flows, and removes the “bucket‑short” that slows synchronized training. If a path slows, MRC quickly switches to a faster alternative, maintaining balanced load.
Eliminating Switch Intelligence
MRC combines multipath topology, data spraying, load‑balancing and packet‑trimming to detect and bypass faults within microseconds. Traditional networks rely on dynamic routing protocols like BGP, which require switches to compute routes and can take seconds to recover from failures.
MRC disables dynamic routing entirely, adopting SRv6 (IPv6 Segment Routing). The sender encodes the identifiers of each hop directly into the packet’s destination address. Switches simply verify the presence of their identifier, strip it, and forward the packet according to a static routing table configured at boot time. No per‑packet route computation is needed.
This static‑forwarding model dramatically reduces switch CPU load and, in production, has shown that even with millions of cables experiencing frequent link flaps, training jobs see no measurable slowdown. Engineers can restart upper‑level switches without coordinating with training teams, and faulty cables can be hot‑repaired while traffic automatically reroutes.
Open Standards and Hardware Deployment
No single company can solve AI hardware challenges alone. MRC has been deployed on OpenAI’s largest Nvidia GB200 supercomputer, on Oracle Cloud Infrastructure’s OCI site in Abilene, Texas, and on Microsoft’s Fairwater supercomputer.
AMD co‑authored the MRC specification and contributed advanced congestion‑control techniques. Using its Pensando Pollara 400 AI NIC, AMD validated an improved RoCEv2 implementation before the MRC standard was finalized, and is now transitioning the technology to its next‑generation Vulcano 800G AI NIC.
The protocol specifications are publicly available through the Open Compute Project, enabling a unified infrastructure standard that can bridge ecosystems across partners and improve resilience against hardware faults, thereby smoothing the path toward general AI.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
