Why Modern AI Demands New Switch Architectures: From OSI to RDMA and Leaf‑Spine
This article explains how AI and high‑performance computing drive the evolution of network protocols and data‑center switch designs, covering OSI basics, TCP/IP limitations, RDMA technologies, leaf‑spine topologies, Nvidia Spectrum platforms, and current market trends.
Network protocols are the set of rules that enable data exchange in computer networks; the OSI seven‑layer model is the internationally recognized standard.
Because HPC and AI require high throughput and low latency, data centers are gradually moving from TCP/IP to RDMA. RDMA includes several branches: InfiniBand, designed specifically for RDMA with hardware‑level reliability but high cost, and RoCE and iWARP, which are Ethernet‑based RDMA technologies.
Key questions addressed:
What is a protocol?
What role do switches play in data‑center architecture?
Are Nvidia switches equivalent to InfiniBand switches?
How to understand Nvidia SuperPOD?
What is the current state of the switch market?
The OSI model defines seven layers:
Physical layer: defines hardware standards such as interface types and transmission rates to transmit bit streams.
Data link layer: handles frame encoding, error correction, and encapsulation of data from the physical layer.
Network layer: creates logical circuits, uses IP addressing, and transmits data in packets.
Transport layer: ensures transmission quality and retransmits lost packets.
Session layer: manages session connections between network devices.
Presentation layer: handles data format conversion and encryption.
Application layer: provides interfaces for user‑level network services.
TCP/IP comprises protocols grouped into four layers (application, transport, network, data link) and can be seen as an optimized version of the OSI model. However, TCP/IP has drawbacks for HPC:
Latency of tens of microseconds due to multiple context switches and CPU‑based packet processing.
High CPU load because the TCP/IP stack requires frequent memory copies, making CPU usage proportional to bandwidth.
RDMA (Remote Direct Memory Access) allows direct memory access over the network without OS kernel intervention, enabling high‑throughput, low‑latency communication, especially in large parallel clusters.
Switches operate at the data‑link layer, using MAC addresses to forward frames, while routers work at the network layer, using IP addresses for routing between subnets.
Traditional three‑tier data‑center networks (access, aggregation, core) have limitations such as bandwidth waste, large fault domains, and increased latency due to multiple hops. The leaf‑spine architecture addresses these issues with a flat, non‑blocking design, ECMP for multi‑path routing, and high fault tolerance.
Nvidia’s Spectrum and Quantum platforms combine Ethernet and InfiniBand switches. Nvidia Spectrum‑X, designed for generative AI, extends RoCE and uses the BlueField‑3 DPU to achieve up to 95% effective bandwidth in large‑scale systems, providing performance isolation, fault resilience, and consistent AI workload performance.
The switch market is currently strong, driven by AI demand, with a shift toward high‑end products. Cisco holds the largest share, while Arista is rapidly growing. In Q1 2023, global Ethernet switch revenue reached $10.021 billion, with 200 G/400 G switches up 41.3% YoY. Port shipments grew 14.8% to 229 million units.
Pricing varies: Nvidia’s QM9700 switch costs roughly twice that of the QM8700/8790, with a list price of about $38,000 versus $23,000–$17,000 for the older models.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
