Why NPO Beats CPO for AI Data Center Scale‑up: Alibaba Cloud’s Dual‑Network Blueprint
Alibaba Cloud argues that while CPO (Co‑Packaged Optics) looks perfect on paper, its closed ecosystem and production delays make it impractical for today’s 100k‑GPU AI clusters, and proposes an open, dual‑network architecture—HPN for scale‑out and UPN for ultra‑low‑latency scale‑up—driving a realistic near‑term roadmap for optical interconnect.
1. Dual‑Network Architecture for 100k‑GPU AI Clusters
Alibaba Cloud warns that a single network cannot sustain a 100,000‑GPU AI cluster; the bandwidth and latency requirements exceed the capabilities of a monolithic design. The company therefore defines a next‑generation architecture that separates the network into two layers: HPN (Horizontal Packaged Network) for massive scale‑out across racks, and UPN (Ultra‑Packaged Network) for high‑speed, low‑latency interconnect within a super‑node.
HPN : Handles horizontal expansion (scale‑out), connecting rack‑to‑rack and supporting the jump from ten‑thousand to hundred‑thousand GPUs.
UPN : Provides intra‑node ultra‑fast links (scale‑up), achieving bandwidth nearly nine times that of a typical GPU Ethernet and latency in the nanosecond range, tailored for trillion‑parameter model training.
In simple terms, an outer network carries large‑scale traffic, while an inner network delivers ultra‑high‑speed communication for the most demanding workloads.
2. Open ETH + Ethernet Backbone Instead of Closed Protocols
Many vendors lock customers into proprietary high‑performance interconnects, but Alibaba Cloud adopts an open Ethernet‑based UPN that can be white‑boxed and decoupled from any single chip or switch vendor. This approach gives users performance while retaining control and avoiding vendor‑specific lock‑in.
3. Copper vs. Optical: A System‑Level Perspective
While copper cables are cheap, reliable, and power‑efficient for short distances, optical modules excel at long reach, high density, and flexible topology—making them ideal for large clusters. Alibaba’s stance is pragmatic: optical is not inherently superior, but it becomes necessary when scaling up the system to overcome physical bandwidth limits.
4. Choosing Between LPO, LRO, and FRO
FRO/LRO : Strong interoperability and generality, suitable for scale‑out deployments.
LPO : Offers ultra‑low latency and low power for scale‑up, but is highly customized with a narrow ecosystem.
The recommendation is to match the technology to the specific scenario rather than following hype.
5. Key Conclusion: Deploy NPO Now, CPO Later
Alibaba Cloud bluntly states that CPO, despite its excellent signal quality, requires extensive co‑design, has a closed ecosystem, and is far from mass production. In contrast, NPO (Near‑Packaged Optics) provides sufficient signal quality, easy integration, an open ecosystem, and can reuse existing supply chains, making it the realistic choice for 2026‑2028 deployments.
6. New Standards Defined by Alibaba Cloud
Alibaba Cloud is moving from a buyer to a standards‑defining player. It announces a roadmap:
3.2 Tbps NPO : Targeted for mid‑2026 volume production, compliant with OIF standards.
6.4 Tbps Ultra‑NPO (UPO) : Introduced by Alibaba, split into XD/HD/SD grades to cover high‑density to regular scenarios, with defined package size, power, and PCB requirements.
This signals that the industry will soon have a concrete, open specification for high‑density optical interconnect.
7. Final “Soul‑Crushing” Demand to the Industry
Alibaba Cloud ends with a stark warning: the optical‑communication industry must keep pace with the massive scale‑up of AI data centers, or risk being left behind. The message is not a suggestion but a decisive ultimatum from a dominant customer‑side stakeholder.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
