Fundamentals 7 min read

How to Exchange RDMA Connection Parameters: Methods, Pros, and Pitfalls

Establishing an RDMA connection requires exchanging key parameters such as LID, QP number, and memory keys, and this article systematically outlines the essential information, compares six exchange methods—from static configuration to distributed services—and evaluates their advantages, drawbacks, and suitable scenarios.

BirdNest Tech Talk

Dec 1, 2024

How to Exchange RDMA Connection Parameters: Methods, Pros, and Pitfalls

RDMA communication begins with a "connection establishment" phase during which both ends must exchange critical information. The required data can be grouped into three categories:

Basic connection information

Local Identifier (LID)

Port number

Queue Pair (QP) number

Network port details

Memory access information

Remote Virtual Address

Remote Key (R_Key)

Memory region size

Exchange mechanism

Traditional method: use a TCP socket

Modern methods: IB management tools, OFED, or dedicated service‑discovery mechanisms

A typical C‑style structure that holds these fields is shown below:

struct connection_info {
    uint16_t lid;          // Local Identifier
    union ibv_gid gid;    // Global Identifier
    uint32_t qp_num;      // Queue Pair Number
    uint64_t remote_addr; // Remote memory address
    uint32_t rkey;        // Remote Key
    uint32_t size;        // Memory region size
};

The generic exchange workflow consists of four steps:

One side acts as a server and listens.

The other side initiates a connection.

Both sides send the parameters required for the RDMA connection.

The RDMA link is finally established.

Common Parameter‑Exchange Methods

1. Static Configuration

Characteristics : Parameters are hard‑coded or placed in a configuration file before deployment.

Advantages : Simple, no extra communication channel needed.

Disadvantages : Inflexible, error‑prone for large or dynamic topologies, requires manual updates.

Suitable scenarios : Small, fixed‑topology systems.

2. Exchange via TCP/IP

Principle : Before RDMA traffic, the peers open a regular TCP socket and exchange QP number, LID, GID, memory keys, etc.

Steps :

Establish a TCP/IP connection.

Send the RDMA parameters over the socket.

Close the TCP channel and switch to RDMA.

Advantages : Works in any network environment, no special configuration.

Disadvantages : Adds one extra round of TCP traffic.

Suitable scenarios : Point‑to‑point or small‑scale distributed systems.

3. Distributed Key‑Value Store (e.g., etcd or Consul)

Principle : Store RDMA parameters in a distributed KV store; each side reads the counterpart’s entry.

Advantages : Scalable, supports dynamic topologies, provides high availability and consistency.

Disadvantages : Introduces a dependency on the KV service and may increase latency.

Suitable scenarios : Large‑scale distributed environments such as cloud platforms or HPC clusters.

4. Dedicated Parameter‑Exchange Server

Principle : A central server collects parameters from each participant and returns the peer’s data.

Process :

Each node sends its RDMA info to the server.

The server replies with the opposite node’s info.

Advantages : Centralized management simplifies debugging and monitoring; fits complex topologies.

Disadvantages : The server is a single point of failure and must be made highly available.

Suitable scenarios : Distributed systems that require strong control over connections.

5. Shared File System or Database

Principle : Nodes write their parameters to a shared file or database and read the peer’s file.

Advantages : Very easy to implement.

Disadvantages : File‑system access is slower, unsuitable for high‑frequency updates.

Suitable scenarios : One‑time initialization during system startup.

6. IP over InfiniBand (IPoIB) as an Auxiliary Channel

Principle : Use IPoIB to simulate TCP/IP on the InfiniBand fabric and exchange parameters.

Advantages : Leverages existing InfiniBand hardware without needing a separate Ethernet network.

Disadvantages : IPoIB performance is lower than native RDMA.

Suitable scenarios : Environments where the entire network is InfiniBand.

The choice among these methods depends on system scale, performance requirements, and architectural complexity. Small deployments can often get away with static configuration or TCP/IP exchange, while large, dynamic clusters benefit from distributed stores or a dedicated exchange server.

distributed-systems Networking RDMA Infiniband Parameter Exchange

Written by

BirdNest Tech Talk

Author of the rpcx microservice framework, original book author, and chair of Baidu's Go CMC committee.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.