Artificial Intelligence 4 min read

Practice of Multi-NIC Container Network Acceleration for Offline Training

The talk explains how Vivo leverages a Kubernetes‑based solution that combines Calico and RoCEv2 to migrate offline training workloads from single‑NIC to multi‑NIC, integrating loss‑less RDMA, planning topology and IP allocation, and employing Volcano, SpiderPool, Macvlan, and Multus CNI for efficient container networking.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Practice of Multi-NIC Container Network Acceleration for Offline Training

Kubernetes Community Days (KCD) is a global event organized by the Cloud Native Computing Foundation (CNCF) together with local CNCF ambassadors, staff, and member organizations.

The Shenzhen edition will feature a talk by vivo, a CNCF member, presented by Zhang Rong and Ou Xipei.

Talk title: Practice of Multi‑NIC Container Network Acceleration for Offline Training

Topic introduction: Offline training workloads rely on RDMA communication to speed up model training. Vivo builds its offline training cluster using Infiniband and TCP networks. In the Infiniband‑based cluster, pods run in host‑network mode, limiting each node to a single pod, while many workloads still use TCP, resulting in lower efficiency. To create a unified network architecture, Vivo adopts a Kubernetes‑based solution that combines Calico and RoCEv2. Calico provides container networking for remote data access, while RoCEv2 enables RDMA communication between pods.

The talk will cover:

How to migrate training tasks from single‑NIC to multi‑NIC on an offline platform.

How to integrate loss‑less RoCEv2 networks into Kubernetes networking.

How to use RoCEv2 inside Kubernetes pods.

How to plan network topology and IP allocation.

Additional technologies such as Volcano, SpiderPool, Macvlan, and Multus CNI will be introduced to help participants understand multi‑NIC container techniques.

Audience benefits: Attendees will learn best practices for multi‑NIC container and distributed‑system networking, covering container network topology, routing, load balancing, security, monitoring, and more.

Forum: Cloud Native Main Hall

Presentation time: December 16, 11:25‑11:55

cloud nativeKubernetesRDMAoffline trainingContainer NetworkingMulti-NIC
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.