Industry Insights 12 min read

Why Hyper‑Converged Data Center Networks Are the Future of AI‑Driven Infrastructure

The article analyzes how AI‑driven workloads, exploding storage and compute capabilities, and distributed architectures expose the limits of traditional three‑network data‑center designs, and explains why a lossless, hyper‑converged Ethernet network with zero‑loss, low‑latency, high‑throughput characteristics is becoming essential.

IT Architects Alliance

Sep 13, 2021

Why Hyper‑Converged Data Center Networks Are the Future of AI‑Driven Infrastructure

Background

Modern data‑center workloads fall into three categories: general compute, high‑performance computing (HPC), and storage. Historically they are served by three separate fabrics – Ethernet for general compute, InfiniBand for HPC, and Fibre Channel for storage – each with distinct latency, loss‑tolerance, and scalability requirements.

AI‑Era Drivers that Push Network Design

1. Storage and compute outpace network bandwidth

NVMe SSDs deliver >100× lower latency than HDDs and GPUs/AI ASICs provide >100× higher compute throughput. Consequently, network communication now consumes 10 %–60 % of end‑to‑end latency, becoming the primary performance bottleneck.

2. RDMA replaces TCP/IP but needs a lossless Ethernet fabric

Remote Direct Memory Access (RDMA) reduces intra‑server transfer latency to ~1 µs and offloads the CPU, but it requires an Ethernet fabric with packet‑loss rates < 10⁻⁵. Conventional Ethernet loses packets at 10⁻³–10⁻², causing RDMA throughput to collapse.

3. Distributed architectures increase congestion

Large‑scale micro‑service and financial‑service deployments generate incast bursts, many‑to‑one traffic, and “large‑packet” flows that saturate switch buffers and trigger congestion‑induced loss.

Core Performance Metrics

To support AI and distributed workloads, a next‑generation data‑center network must simultaneously achieve:

Zero packet loss (loss rate ≤ 10⁻⁵) to preserve RDMA throughput.

Ultra‑low latency (sub‑microsecond per hop, end‑to‑end ≤ 1 µs for RDMA).

High sustained throughput (25 Gbps, 100 Gbps, or 400 Gbps per port) without congestion‑induced throttling.

Hyper‑Converged Network vs. Traditional HCI

Hyper‑Converged Infrastructure (HCI) bundles compute, storage, networking, and virtualization into a single appliance, requiring a redesign of the entire stack. A hyper‑converged data‑center network isolates the network layer only, keeping existing compute and storage stacks unchanged while delivering lossless Ethernet at commodity cost.

Technical Architecture

The solution transports RDMA traffic using RoCEv2 over a lossless Ethernet fabric built with Huawei’s iLossless algorithm. Four cooperating technology blocks are employed:

Traffic‑Control : End‑to‑end rate limiting and Priority Flow Control (PFC) deadlock detection/prevention to avoid congestion‑induced loss.

Congestion‑Control : AI‑driven ECN generation, iQCN, ECN Overlay, and NPCC replace the legacy DCQCN mechanism, providing adaptive, network‑wide flow regulation.

Intelligent Lossless Storage Network (iNOF) : Host‑side rapid control loops that coordinate storage servers with the lossless fabric.

Reference : https://support.huawei.com/info-finder/encyclopedia/zh/index.html

Key Operational Benefits

Independent third‑party testing (EANTC) shows up to 44 % reduction in compute latency for HPC and a 25 % increase in IOPS for distributed storage while guaranteeing zero loss.

Network CAPEX is typically ~10 % of total data‑center spend; improving network efficiency yields a ~10× leverage on overall cost.

Full‑life‑cycle SDN automation reduces OPEX by >60 % and allows existing Ethernet teams to manage the fabric.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Network Architecture AI RDMA data center congestion control Hyper-Converged Network

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.