Understanding InfiniBand and RDMA: Concepts and Configuration Guide
This article provides an overview of InfiniBand and Remote Direct Memory Access (RDMA), explains their underlying protocols and hardware, and offers detailed step‑by‑step guidance for configuring InfiniBand, RDMA, RoCE, and related services on Red Hat Enterprise Linux systems.
The document introduces InfiniBand and Remote Direct Memory Access (RDMA), describing InfiniBand's physical link protocol and the InfiniBand Verbs API that implements RDMA, which enables high‑throughput, low‑latency memory access between machines without involving the CPU.
It explains how RDMA bypasses the kernel’s networking stack, reducing CPU load, and details the role of the RDMA Connection Manager (RDMA_CM) in establishing reliable data transfers.
Supported hardware and software on Red Hat Enterprise Linux are listed, including Mellanox, Broadcom, and QLogic adapters, as well as the InfiniBand Verbs API and various RDMA protocols such as iWARP, RoCE v1, and RoCE v2.
Configuration guidance covers:
Setting up RoCE (v1 and v2), including required Ethernet ethertype (0x8915) for RoCE v1 and UDP port 4791 for RoCE v2.
Using Soft‑RoCE (RXE) as a software‑only RDMA implementation.
Configuring the RDMA core, kernel memory pinning, and the rdma service via systemd and udev rules.
Managing the InfiniBand subnet manager, with options for the built‑in manager or the OpenSM provided by Red Hat.
Configuring IP over InfiniBand (IPoIB) in Datagram and Connected modes, MTU considerations, and the impact on performance and kernel memory usage.
The guide also notes compatibility constraints, such as the inability to create IPoIB devices on iWARP or RoCE hardware, and recommends using the appropriate mode (Datagram or Connected) based on traffic patterns.
Throughout, the article references a detailed external guide titled “配置InfiniBand和RDMA网络” for step‑by‑step configuration instructions.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.