Fundamentals 5 min read

Step-by-Step Guide to RDMA Programming with the ibverbs API

This tutorial walks through the complete RDMA programming workflow using the ibverbs API, covering device initialization, memory registration, completion queue and queue pair creation, state transitions, send/receive operations, completion handling, and resource cleanup with concrete C code examples.

BirdNest Tech Talk
BirdNest Tech Talk
BirdNest Tech Talk
Step-by-Step Guide to RDMA Programming with the ibverbs API

Initialization Phase

The process begins by obtaining the list of RDMA devices with ibv_get_device_list(NULL), selecting the first device, and opening it via ibv_open_device(dev_list[0]). A protection domain (PD) is then allocated using ibv_alloc_pd(context), which isolates resources for subsequent operations.

Memory Registration

Application buffers must be registered with the NIC to obtain remote access keys. The code calls

ibv_reg_mr(pd, buffer, buffer_size, IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_READ | IBV_ACCESS_REMOTE_WRITE)

, specifying the PD, buffer address, size, and required access permissions. The returned ibv_mr structure provides lkey and rkey for later use.

Creating the Completion Queue (CQ)

A completion queue is created with ibv_create_cq(context, cq_size, NULL, NULL, 0). The cq_size determines how many completions can be stored before the application must poll. The CQ will later be used for both send and receive completions.

Creating the Queue Pair (QP)

The QP attributes are defined in an ibv_qp_init_attr structure. The example sets .qp_type = IBV_QPT_RC for a reliable connection, assigns the previously created CQ for both send and receive, and configures capabilities such as max_send_wr = 16, max_recv_wr = 16, and single scatter/gather elements. The QP is instantiated with ibv_create_qp(pd, &qp_init_attr).

QP State Transitions

Before data transfer, the QP must move through three states. First,

ibv_modify_qp(qp, &attr, IBV_QP_STATE | IBV_QP_PKEY_INDEX | IBV_QP_PORT)

sets the QP to IBV_QPS_INIT with port 1 and pkey index 0. Next, the state changes to IBV_QPS_RTR (Ready‑to‑Receive) and finally to IBV_QPS_RTS (Ready‑to‑Send) using subsequent ibv_modify_qp calls that only modify the qp_state field.

Send Operation

To send data, a scatter/gather element ( ibv_sge) is prepared with the local buffer address, length, and the memory region's lkey. A send work request ( ibv_send_wr) is then populated with an identifier, the SGE list, operation code IBV_WR_SEND, and the flag IBV_SEND_SIGNALED to request a completion notification. The request is posted using ibv_post_send(qp, &wr, &bad_wr).

Receive Operation

Receiving mirrors the send setup: an ibv_sge points to the receive buffer, and an ibv_recv_wr is filled with a unique wr_id and the SGE. The receive request is posted with ibv_post_recv(qp, &recv_wr, &bad_recv_wr), allowing the NIC to place incoming data into the registered buffer.

Completion Handling

After posting send and receive requests, the application polls the CQ with ibv_poll_cq(cq, 1, &wc). If num_comp > 0, the status field of the returned ibv_wc is examined; a status of IBV_WC_SUCCESS indicates the operation completed successfully.

Resource Cleanup

When communication is finished, all resources are released in reverse order: the QP and CQ are destroyed with ibv_destroy_qp and ibv_destroy_cq, the memory region is deregistered via ibv_dereg_mr, the protection domain is deallocated, and finally the device context is closed with ibv_close_device.

Cnetwork programminglow-latencyRDMAibverbs
BirdNest Tech Talk
Written by

BirdNest Tech Talk

Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.