Industry Insights 11 min read

Why InfiniBand Beats TCP/IP: Deep Dive into Architecture and Socket Direct

This article explains how InfiniBand’s RDMA‑based architecture, layered protocol stack, and Mellanox Socket Direct technology deliver far higher bandwidth, lower latency, and better CPU efficiency than traditional TCP/IP networks, and it presents performance test results that show up to an 80% latency reduction.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Why InfiniBand Beats TCP/IP: Deep Dive into Architecture and Socket Direct

Background and Motivation

Traditional TCP/IP’s multi‑layered design incurs significant buffering, network latency, and operating‑system overhead, which limits performance for large‑scale clusters. As network demands grew for open, high‑bandwidth, low‑latency, and highly reliable communication, InfiniBand (IB) emerged as a switched‑fabric architecture that addresses these challenges.

Key Features of InfiniBand

InfiniBand leverages RDMA (Remote Direct Memory Access) to let a server read or write remote memory without kernel intervention, preserving the high bandwidth and low latency of a bus while reducing CPU load. This makes IB especially suitable for storage‑heavy clusters and high‑performance computing.

InfiniBand Protocol Stack

Physical Layer : Serial data streams over up to four links, each supporting speeds such as 56 Gb/s per lane.

Link Layer : Credit‑Based Flow Control ensures the receiver has enough buffer before transmission; supports QoS via Virtual Lanes (VL0‑VL15) and priority scheduling (SL).

Network Layer : Uses a Global Route Header (GRH) with a 128‑bit IPv6‑style address to route packets across subnets.

Transport Layer : Handles packet distribution, channel multiplexing, segmentation and reassembly, and directs packets to Queue Pairs (QP). When payload exceeds the MTU, the layer fragments and reassembles the data.

Fabric Architecture

IB devices include Channel Adapters (CA), Switches, and Routers. A CA can be a Host Channel Adapter (HCA) for compute nodes or a Target Channel Adapter (TCA) for storage/I/O devices. Subnets can contain up to 65 536 nodes, each managed by a Subnet Manager that assigns LIDs and coordinates with the Subnet Management Agent.

Switches forward traffic based on Local Route Headers (LRH) and LIDs, while Routers connect different subnets using the GRH’s IPv6 address. The overall fabric follows a switched‑fabric topology, enabling direct, high‑speed paths between endpoints.

Mellanox Socket Direct Technology

Mellanox’s Socket Direct splits a PCIe x16 HCA into two PCIe x8 cards (Main and Auxiliary) and attaches each to a separate CPU socket in a dual‑socket server. This bypasses the inter‑processor bus, allowing each CPU to access the network directly via its own PCIe lane, which reduces inter‑CPU traffic, lowers latency, and improves overall system throughput.

The solution also includes a dedicated SAS cable linking the two PCIe cards, forming a unique network topology that offloads traffic from the CPU interconnect.

Performance Evaluation

Tests comparing a ConnectX‑based Socket Direct adapter in a dual‑socket server with a standard PCIe x16 100 Gb/s adapter (single‑socket) measured TCP throughput, latency, CPU utilization, and RDMA benchmarks. Results showed an average latency reduction of about 80 % for the Socket Direct configuration, along with higher throughput and lower CPU usage due to the direct PCIe‑to‑CPU path.

OpenFabrics Software Stack

The OpenFabrics Enterprise Distribution (OFED) provides kernel drivers, RDMA APIs, and support for various transports (iWARP, RoCE, InfiniBand). It enables high‑performance messaging (MPI), storage protocols (iSER, NFS‑RDMA, SRP), and integrates with Ethernet‑based fabrics, making it a versatile foundation for modern data‑center and HPC environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

network architectureHigh‑performance computingRDMAInfiniBandMellanoxFabricSocket Direct
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.