Why and How to Configure Flow Control for RDMA and RoCE Networks
This guide explains the necessity of flow control in high‑performance storage networks, details RDMA and RoCE protocols, and provides step‑by‑step commands for configuring DSCP, PFC, ECN, and related settings on servers and switches.
High‑performance storage systems demand ultra‑low latency, high bandwidth, and zero packet loss; flow control (PFC) combined with RDMA fulfills these requirements by creating a lossless network environment.
Background: RDMA and RoCE
Remote Direct Memory Access (RDMA) moves data directly between memory and I/O devices without CPU involvement, drastically reducing latency. Traditional RDMA (InfiniBand) lacks a robust retransmission mechanism, so it relies on a lossless network, which is provided by flow control technologies such as Priority Flow Control (PFC).
RoCE (RDMA over Converged Ethernet) extends RDMA to Ethernet. Two versions exist:
RoCE v1 : Encapsulates RDMA frames in Ethernet frames (Layer 2), using VLAN PCP for priority.
RoCE v2 : Encapsulates RDMA in UDP/IP/Ethernet (Layer 3), allowing priority marking via VLAN PCP or IP DSCP.
Switch Requirements
Verify that the switch model supports DSCP, PFC, and ECN as indicated in the vendor documentation. These features enable lossless transport for RoCE traffic.
Key concepts:
DSCP : Differentiated Services Code Point, used to prioritize traffic at the IP layer.
CNP : Congestion Notification Packet, a lightweight packet that signals congestion to the sender.
ECN : Explicit Congestion Notification, marks packets to trigger congestion control in RoCEv2.
Server‑Side Configuration (rc.local)
All commands should be added to rc.local to apply automatically after boot.
Enable DSCP‑based flow control on the NIC: mlnx_qos -i <interface> --trust dscp Set DSCP value 26 (TOS 106) for all RoCE traffic:
echo 106 > /sys/class/infiniband/<mlx-device>/tc/1/traffic_classConfigure RDMA connection manager to use DSCP 26: cma_roce_tos -d <mlx-device> -t 106 Enable ECN for TCP (ensuring ECN packets reach the peer): sysctl -w net.ipv4.tcp_ecn=1 Activate DCQCN in priority 3:
echo 1 > /sys/class/net/<interface>/ecn/roce_np/enable/3
echo 1 > /sys/class/net/<interface>/ecn/roce_rp/enable/3Set CNP to use DSCP over priority 6:
echo 48 > /sys/class/net/<interface>/ecn/roce_np/cnp_dscpUse mlnx_qos to configure PFC and trust settings:
mlnx_qos -i <interface> --trust=dscp --pfc 0,0,0,1,0,0,1,0Switch Configuration Reference
The complete switch configuration and verification steps are available in the referenced knowledge base (access requires membership).
Tech Stroll Journey
The philosophy behind "Stroll": continuous learning, curiosity‑driven, and practice‑focused.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
