An Introduction to Remote Direct Memory Access (RDMA) and Its Supporting Protocols
Remote Direct Memory Access (RDMA) is a high‑performance networking technology that moves data directly between the memories of two computers without OS involvement, offering low latency, zero‑copy transfers and reduced CPU load through protocols such as InfiniBand, RoCE and iWARP.
Remote Direct Memory Access (RDMA) is a direct‑memory‑access technology that transfers data from the memory of one computer to another without the intervention of either operating system.
Traditional TCP/IP processing requires the operating system and multiple software layers, consuming significant server resources and memory‑bus bandwidth as data is copied between system memory, CPU caches and network controller buffers, leading to high latency.
RDMA enables computers to directly access the memory of other machines, moving data quickly to remote storage without affecting the operating system. The principle and comparison with TCP/IP are illustrated in the accompanying diagram.
In practice, a server’s NIC can read and write the memory of another server, achieving high bandwidth, low latency and low resource utilization; applications merely specify the memory address, start the transfer, and wait for completion.
Network protocols that support RDMA
InfiniBand (IB): a next‑generation network protocol that has supported RDMA from its inception, requiring compatible NICs and switches.
RDMA over Converged Ethernet (RoCE): enables RDMA over standard Ethernet infrastructure; the Ethernet header is used, with an InfiniBand header encapsulated on top, requiring special NICs that support RoCE.
iWARP: allows RDMA to be performed over TCP; it can be implemented in software on standard NICs but loses many performance advantages of native RDMA.
RDMA Advantages
Zero‑copy: the NIC transfers data directly between application memory and the network, eliminating copies between user space and kernel space and reducing latency.
Kernel bypass: applications send commands directly to the NIC without invoking kernel memory calls, minimizing context switches.
No CPU involvement: remote memory can be accessed without consuming CPU cycles on the remote host.
Message‑based transactions: data is handled as discrete messages rather than streams, simplifying processing.
Scatter/gather support: multiple memory buffers can be read or written as a single operation.
During remote memory read/write, the RDMA message contains the remote virtual address; the remote application registers the corresponding memory buffer on its NIC, and the CPU is only involved in connection setup and registration.
RDMA Implementations
Common implementations include virtual interface architecture, RoCE, InfiniBand, and iWARP. InfiniBand was the earliest RDMA protocol, widely used in high‑performance computing but requiring expensive dedicated hardware. This article focuses on RoCEv2.
RoCEv2’s protocol stack consists of the InfiniBand transport layer, UDP, IP and Ethernet. The packet format shows the UDP destination port 4791 identifying RoCEv2 frames, followed by the InfiniBand Base Transport Header (BTH), payload, ICRC and FCS.
The IB BTH format defines fields such as Opcode, Solicited Event (S), Migration Request (M), Pad, Transport Header Version (TVer), Partition Key, Destination QP, Acknowledge Request, and Packet Sequence Number, which together describe the packet’s purpose and routing.
RDMA NICs encapsulate the entire stack in hardware. Applications use verbs APIs (e.g., WRITE, READ, SEND) to create IB payloads, which are then handed directly to the NIC for hardware‑level packet construction and transmission.
References: RDMA Consortium specifications . The article is provided by the SSD PK community; please retain the original source when sharing.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.