Operations 6 min read

Why and How to Configure Flow Control for RDMA and RoCE Networks

This guide explains the necessity of flow control in high‑performance storage networks, details RDMA and RoCE protocols, and provides step‑by‑step commands for configuring DSCP, PFC, ECN, and related settings on servers and switches.

Tech Stroll Journey
Tech Stroll Journey
Tech Stroll Journey
Why and How to Configure Flow Control for RDMA and RoCE Networks

High‑performance storage systems demand ultra‑low latency, high bandwidth, and zero packet loss; flow control (PFC) combined with RDMA fulfills these requirements by creating a lossless network environment.

Background: RDMA and RoCE

Remote Direct Memory Access (RDMA) moves data directly between memory and I/O devices without CPU involvement, drastically reducing latency. Traditional RDMA (InfiniBand) lacks a robust retransmission mechanism, so it relies on a lossless network, which is provided by flow control technologies such as Priority Flow Control (PFC).

RoCE (RDMA over Converged Ethernet) extends RDMA to Ethernet. Two versions exist:

RoCE v1 : Encapsulates RDMA frames in Ethernet frames (Layer 2), using VLAN PCP for priority.

RoCE v2 : Encapsulates RDMA in UDP/IP/Ethernet (Layer 3), allowing priority marking via VLAN PCP or IP DSCP.

Switch Requirements

Verify that the switch model supports DSCP, PFC, and ECN as indicated in the vendor documentation. These features enable lossless transport for RoCE traffic.

Key concepts:

DSCP : Differentiated Services Code Point, used to prioritize traffic at the IP layer.

CNP : Congestion Notification Packet, a lightweight packet that signals congestion to the sender.

ECN : Explicit Congestion Notification, marks packets to trigger congestion control in RoCEv2.

Server‑Side Configuration (rc.local)

All commands should be added to rc.local to apply automatically after boot.

Enable DSCP‑based flow control on the NIC: mlnx_qos -i <interface> --trust dscp Set DSCP value 26 (TOS 106) for all RoCE traffic:

echo 106 > /sys/class/infiniband/<mlx-device>/tc/1/traffic_class

Configure RDMA connection manager to use DSCP 26: cma_roce_tos -d <mlx-device> -t 106 Enable ECN for TCP (ensuring ECN packets reach the peer): sysctl -w net.ipv4.tcp_ecn=1 Activate DCQCN in priority 3:

echo 1 > /sys/class/net/<interface>/ecn/roce_np/enable/3
 echo 1 > /sys/class/net/<interface>/ecn/roce_rp/enable/3

Set CNP to use DSCP over priority 6:

echo 48 > /sys/class/net/<interface>/ecn/roce_np/cnp_dscp

Use mlnx_qos to configure PFC and trust settings:

mlnx_qos -i <interface> --trust=dscp --pfc 0,0,0,1,0,0,1,0

Switch Configuration Reference

The complete switch configuration and verification steps are available in the referenced knowledge base (access requires membership).

Switch configuration diagram
Switch configuration diagram
Flow ControlRDMANetwork ConfigurationRoCEECNPFCDSCP
Tech Stroll Journey
Written by

Tech Stroll Journey

The philosophy behind "Stroll": continuous learning, curiosity‑driven, and practice‑focused.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.