Tagged articles
8 articles
Page 1 of 1
SuanNi
SuanNi
May 8, 2026 · Artificial Intelligence

How OpenAI’s MRC Protocol Redesigns Communication for 100,000‑GPU Clusters

OpenAI, together with AMD, Broadcom, Intel, Microsoft and Nvidia, introduced the Multipath Reliable Connection (MRC) protocol, which splits a single 800 Gb/s link into eight 100 Gb/s planes, enabling full‑mesh connectivity for over 100 k GPUs with fewer switches, lower cost, higher resilience, and dynamic load‑balancing that eliminates congestion and hardware‑failure impacts during large‑scale AI training.

AI networkingGPU clustersMRC
0 likes · 12 min read
How OpenAI’s MRC Protocol Redesigns Communication for 100,000‑GPU Clusters
Architects' Tech Alliance
Architects' Tech Alliance
Apr 19, 2026 · Industry Insights

Why AI Training Hits a Network Wall and the Five Protocols Fighting for the Next‑Gen AI Interconnect

As AI models scale from billions to trillions of parameters and GPU clusters grow from dozens to hundreds of thousands of cards, traditional data‑center networking can no longer handle exabyte‑level traffic, prompting a fierce battle among five open‑source scale‑up protocols—ESUN, SUE, ETH‑X, OISA, and ETH+—each offering different trade‑offs in latency, compatibility, performance, and scalability.

AI networkingGPU cluster interconnectfuture AI infrastructure
0 likes · 11 min read
Why AI Training Hits a Network Wall and the Five Protocols Fighting for the Next‑Gen AI Interconnect
Architects' Tech Alliance
Architects' Tech Alliance
Oct 9, 2025 · Artificial Intelligence

Unlocking AI Scale‑Up: Inside SUE, OISA, ALS and ETH+ High‑Performance Interconnects

This article introduces four cutting‑edge AI networking technologies—SUE, OISA, ALS, and ETH+—detailing their backgrounds, architectural designs, and performance enhancements that enable ultra‑high bandwidth, low‑latency, and scalable interconnects for modern AI compute clusters.

AI networkingHigh‑performance computingScale‑Up
0 likes · 13 min read
Unlocking AI Scale‑Up: Inside SUE, OISA, ALS and ETH+ High‑Performance Interconnects
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 2, 2025 · Cloud Computing

How Alibaba Cloud Extended VCSEL Reach to 500 m: Breakthroughs from ECOC 2025

At ECOC 2025 in Copenhagen, Alibaba Cloud showcased groundbreaking research that pushed VCSEL transmission over multimode fiber to 500 meters, explored hollow‑core fiber challenges such as CO₂ absorption, and promoted 800ZR coherent modules for data‑center interconnect, highlighting AI‑driven optical networking advances.

AI networkingVCSELdata center interconnect
0 likes · 6 min read
How Alibaba Cloud Extended VCSEL Reach to 500 m: Breakthroughs from ECOC 2025
Architects' Tech Alliance
Architects' Tech Alliance
Sep 29, 2025 · Artificial Intelligence

How NVLink and NVSwitch Power AI’s Next‑Gen High‑Performance Networks

This article, part of the 2025 AI Network Technology Whitepaper, classifies AI high‑performance networking into Scale‑Up, Scale‑Out, and frontier breakthroughs, then dives deep into NVLink’s evolution, technical features, NVSwitch’s full‑mesh architecture, and the newly opened NVLink Fusion ecosystem.

AI networkingGPU interconnectHigh‑performance computing
0 likes · 8 min read
How NVLink and NVSwitch Power AI’s Next‑Gen High‑Performance Networks
Architects' Tech Alliance
Architects' Tech Alliance
Sep 28, 2025 · Artificial Intelligence

How AI Workloads Are Redefining Network Architecture: Key Requirements and Topologies

The article examines how the rapid growth of AI models and workloads is reshaping network design, highlighting the need for ultra‑high bandwidth, sub‑millisecond latency, reliability, scalable topologies like Fat‑Tree and Dragonfly, and robust security and QoS mechanisms across data‑center, cloud, and edge environments.

AI networkingData centerDistributed Training
0 likes · 11 min read
How AI Workloads Are Redefining Network Architecture: Key Requirements and Topologies
Architects' Tech Alliance
Architects' Tech Alliance
Aug 20, 2025 · Artificial Intelligence

Dual ToR and Dual‑Plane Designs: Boosting AI Training Performance in Large‑Scale Data Centers

The article explains how non‑stacked dual‑ToR and dual‑plane network architectures, combined with single‑chip high‑performance switches and multi‑rail host networking, dramatically improve reliability, load balance, and end‑to‑end training speed for massive AI models such as GPT‑3 175B.

AI networkingData centerGPU training
0 likes · 11 min read
Dual ToR and Dual‑Plane Designs: Boosting AI Training Performance in Large‑Scale Data Centers
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 25, 2024 · Artificial Intelligence

Highlights of Chinese Enterprises at the 2024 OCP Global Summit: AI Network Architecture, High‑Performance Cooling, and WAN Innovations

The 2024 OCP Global Summit in San Jose showcased Chinese tech leaders like Alibaba Cloud and ByteDance presenting cutting‑edge AI network architectures, liquid‑cooling solutions, SRv6 deployments, high‑performance data‑center designs, and future WAN routing innovations, underscoring China's growing influence in AI infrastructure worldwide.

AI networkingHigh-performance computingOCP Summit
0 likes · 8 min read
Highlights of Chinese Enterprises at the 2024 OCP Global Summit: AI Network Architecture, High‑Performance Cooling, and WAN Innovations