What Is InfiniBand RDMA and How to Configure It on RHEL 8?
This guide explains the fundamentals of InfiniBand and RDMA, details the InfiniBand Verbs API, outlines the steps required for kernel data handling, and provides practical configuration instructions for RoCE, IPoIB, and the subnet manager on Red Hat Enterprise Linux 8.
Overview
InfiniBand is a high‑performance network technology that enables Remote Direct Memory Access (RDMA), allowing one host to read or write another host’s memory without involving the CPU, which reduces latency and CPU usage.
Key Components
InfiniBand physical link protocol – defines the low‑level wire protocol.
InfiniBand Verbs API – the programming interface that implements RDMA operations.
How RDMA Works
When a user‑space program sends data to a remote host, the kernel must:
Receive the incoming data.
Determine whether the data belongs to the requesting application.
Wake the appropriate user‑space process.
Wait for the process to consume the data.
Copy the kernel’s internal buffers into the user‑space buffers provided by the application.
If the host adapter uses DMA, most traffic is copied into system memory, and the kernel performs many context switches, which can increase CPU load.
RDMA Communication Model
RDMA bypasses the kernel for data transfer, placing packets directly into the application’s memory. For InfiniBand, the host adapter does not forward packets to the kernel; instead, it writes them directly into the user buffer.
RHEL 8 Support
Red Hat Enterprise Linux 8 supports InfiniBand hardware and the InfiniBand Verbs API, as well as the following technologies for non‑InfiniBand hardware:
iWARP (RDMA over TCP/IP)
RoCE (RDMA over Converged Ethernet, also called InfiniBand over Ethernet)
RoCE Versions
RoCE v1 uses Ethernet ethertype 0x8915 and allows communication between two hosts in the same broadcast domain.
RoCE v2 runs over IPv4/IPv6 UDP, uses port 4791, and is supported by Mellanox ConnectX‑3 Pro, ConnectX‑4 Lx, and ConnectX‑5 adapters. The client must use RoCE v2 while the server may use either RoCE v1 or RoCE v2.
RDMA Connection Manager (RDMA_CM)
RDMA_CM provides a reliable connection‑oriented interface for data transfer, handling message‑based communication via RDMA devices.
IP over InfiniBand (IPoIB)
IPoIB creates an IP network layer on top of InfiniBand. It can operate in two modes:
Datagram mode – unreliable, connection‑less, limited by the InfiniBand link‑layer MTU (e.g., 2044 bytes).
Connected mode – reliable, connection‑oriented, supports larger MTU up to 65520 bytes, but still subject to IP/TCP header limits.
When the system is configured for Connected mode, multicast traffic is still sent in Datagram mode because InfiniBand switches cannot forward multicast in Connected mode.
Kernel Memory Considerations
RDMA requires pinned physical memory; the kernel cannot swap this memory. Over‑pinning can exhaust system RAM, causing the kernel to terminate the RDMA application. Root users may need to increase the amount of pinned memory for large RDMA workloads.
Subnet Manager Configuration
All InfiniBand fabrics need a subnet manager (SM) to function. If the primary SM fails, a secondary SM takes over. Red Hat provides the OpenSM subnet manager for newer deployments.
IPoIB Device Naming
By default, InfiniBand devices appear as ib0, ib1, etc. To avoid naming conflicts, create persistent udev rules (e.g., naming a device mlx4_ib0).
Practical Configuration Steps
1. Install the rdma service package; systemd will start it when InfiniBand, iWARP, or RoCE hardware is detected.
2. Choose the appropriate RoCE version based on your hardware and configure the client and server accordingly.
3. Set the IPoIB MTU according to the chosen mode (Datagram: 2044, Connected: up to 65520).
4. Ensure sufficient pinned memory is allocated for your RDMA applications.
5. Verify that a subnet manager (e.g., OpenSM) is running and that the InfiniBand devices have unique persistent names.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
