Industry Insights 12 min read

Why GPU Scale‑Up Interconnects Need a New Protocol – Inside UALink and Alibaba’s Alink

The article analyzes the growing demand for high‑bandwidth, low‑latency GPU Scale‑Up interconnects in AI clusters, explains why existing Ethernet and RDMA solutions fall short, and examines the industry‑wide UALink alliance and Alibaba's Alink System as a new open‑ecosystem solution.

Alibaba Cloud Infrastructure

Nov 13, 2024

Why GPU Scale‑Up Interconnects Need a New Protocol – Inside UALink and Alibaba’s Alink

Interconnect Architecture of Intelligent Computing Clusters

GPU Scale‑Up interconnect has become a hot topic in 2024, with many industry discussions emerging. From Alibaba Cloud's perspective, the technology and ecosystem required to support intelligent computing clusters must meet the demands of large‑model training and inference, which need massive memory capacity and bandwidth.

Traditional 8‑GPU single‑node setups can no longer satisfy these needs; supernodes with many GPUs, large shared memory, and low‑latency communication are required. In September, Alibaba Cloud released the Alink System open ecosystem and AI Infra 2.0 server system, whose underlying interconnect protocol complies with the international UALink standard.

UALink Alliance Formation

On October 29, the UALink Alliance was officially launched, inviting members such as AMD, AWS, Astera Labs, Cisco, Google, HPE, Intel, Meta, and Microsoft. Notably, AWS joined despite its typically low‑profile involvement in standards bodies, indicating strong interest in GPU Scale‑Up interconnect solutions.

Types of Interconnect in Intelligent Computing Clusters

Business Network Interconnect : Handles input data, output results, model parameters, and checkpoints; requires wide‑area connectivity and typically uses Ethernet with RDMA support.

Scale‑Out Network Interconnect : Supports data‑parallel (DP) and pipeline‑parallel (PP) training across many GPU cabinets; the emerging standard is the Ultra Ethernet Consortium (UEC) protocol.

Scale‑Up Network Interconnect : Focuses on large‑model inference and tensor/mixture‑of‑experts (MoE) training traffic, requiring high bandwidth and low latency within a single cabinet of 72‑80 GPUs.

Why a New Scale‑Up Protocol Is Needed

GPU‑centric workloads expose limitations of CPU‑oriented Ethernet and RDMA. Existing protocols like PCIe and CXL either add unnecessary overhead or cannot meet the extreme bandwidth (10× higher than Scale‑Out) and latency requirements. Direct GPU‑to‑GPU interconnect with lightweight protocol design saves chip area for compute cores and reduces power consumption.

Feature Dimensions

GPU AI workloads demand massive memory semantics (load/store) and high cross‑chip bandwidth. Scale‑Up interconnect must provide an order of magnitude higher bandwidth than Scale‑Out while maintaining ultra‑low latency, which is only achievable with a dedicated GPU‑direct link.

Interconnect Scope

Scale‑Up interconnect targets large‑model applications, offering an independent high‑performance, low‑latency memory‑shared domain within a cabinet (72‑80 GPUs) and potential expansion to hundreds of GPUs, a capability Ethernet cannot provide.

Current Industry Consensus and UALink Alliance Development

The earliest and most mature solution is NVIDIA's NVLink, but it is not open. Other vendors have proprietary solutions (e.g., Google’s OCS+ICI, AWS NeuronLink). AMD contributed its Infinity Fabric to form the UALink alliance, aiming for an open standard designed specifically for GPU interconnect.

Since its release, UALink has attracted many cloud and hardware vendors, with over thirty members by November 11, seeking competitive advantage and supply‑chain benefits.

Alibaba’s Alink System: Native Support for AI Scale‑Up Open Ecosystem

Alink System (ALS) is Alibaba Cloud's open ecosystem addressing the industry’s need for a standardized Scale‑Up interconnect. ALS consists of a data plane (ALS‑D) and a control plane (ALS‑M). ALS‑D extends UALink with in‑network compute features and supports switch‑based topologies, enabling hundreds to thousands of nodes with 1:1 bandwidth convergence and petabyte‑scale memory sharing.

ALS‑M provides standardized device onboarding, multi‑tenant configurations, and flexible management for both open (UALink) and proprietary solutions, aiming to boost the competitiveness of intelligent computing supernodes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Computing GPU Industry Analysis AI Infrastructure UALink Alink System Scale-Up Interconnect

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.