Automating RDMA and SR-IOV Configuration in Kubernetes with sriov-network-operator and Kube-OVN
This article explains how the integration of sriov-network-operator and Kube-OVN automates the complex configuration and persistence of RDMA and SR‑IOV in Kubernetes, enabling high‑availability, multi‑tenant networking for AI distributed training workloads.
In AI distributed training scenarios, using Remote Direct Memory Access (RDMA) to accelerate inter‑task network data reads has become the preferred performance‑optimization method. RDMA capabilities are provided by smart NICs and, in Kubernetes, require virtualization of the NIC via SR‑IOV or MacVlan so that each Pod can use a virtual function (VF) for RDMA.
The collaboration between Inspur Cloud Sea and the Kube‑OVN community identified pain points in SR‑IOV configuration and introduced the sriov-network-operator project, optimizing it to achieve automated NIC RDMA configuration and delivering a complete production‑grade RDMA solution with Kube‑OVN.
01 Challenges of RDMA and SR‑IOV configuration
Configuring RDMA and SR‑IOV involves many parameters and varies across NIC vendors. Issues include the complexity of initializing maximum and desired VF counts, MTU, VLAN, IOMMU settings, and loading vendor‑specific kernel modules (e.g., ice, iavf, irdma for Intel; OFED for Mellanox). Additionally, VF persistence after node reboot and the need to manually restart the device‑plugin for Kubernetes to recognize new VFs make management cumbersome.
02 Automated configuration with sriov-network-operator
Inspur Cloud Sea adopted the sriov-network-operator to address these problems. Declarative configuration enables dynamic, automated, and highly‑available SR‑IOV setup, reducing manual effort and improving flexibility, making it suitable for cloud‑native environments.
The operator provides a global SR‑IOV template that stores desired settings (NIC name, type, VF count, node labels) as Kubernetes resources in etcd, and a node‑specific template derived from the global one. A distributed SR‑IOV configurator runs as a daemon on each node, performing pre‑setup (enabling IOMMU, loading VFIO_PCI), listening for resource changes, generating and executing configuration scripts, handling pod eviction and node reboot, and updating device‑plugin metadata.
Enhancements include support for Kube‑OVN OVS offload, automatic scheduling of pods to nodes with feature.node.kubernetes.io/network‑SR‑IOV.capable=true , Intel iavf module loading, and forced pod eviction to avoid long‑lasting node unavailability.
03 Kube‑OVN + SR‑IOV solution for coexistence of RDMA and standard Kubernetes networking
The solution deploys separate NICs for RDMA and standard traffic, using Kube‑OVN as a global IPAM to simplify IP address management, achieve multi‑tenant isolation, and provide a unified networking experience. This architecture supports large‑scale AI compute environments, enhancing performance, security, and scalability.
Reference links: https://github.com/kubeovn/kube-ovn , https://github.com/kubeovn/SR-IOV-network-operator .
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.