Cloud Native 15 min read

How We Built a Scalable Container VPC Network with ipvlan and Custom IPAM

This article details the design, implementation, and production rollout of a container VPC solution that leverages ipvlan for CNI, a custom IPAM integration with OpenStack/Ultron, and extensive operational fixes to achieve reliable, high‑performance networking for Kubernetes workloads across multiple clusters.

360 Zhihui Cloud Developer

Aug 16, 2024

How We Built a Scalable Container VPC Network with ipvlan and Custom IPAM

01 Introduction

As containerization becomes ubiquitous, the company’s cluster count and size have grown dramatically. To make pod IPs reachable across the entire network, a BGP announcement to access switches was initially used, but this required precise network planning, extra switch configuration, and introduced high development, management, and operational costs.

The virtualization team’s mature cloud network solution was adopted to let container networks use the virtualized cloud network, reducing costs while providing flexibility, stability, and high performance.

02 Solution

The solution addresses two dimensions: (1) the forwarding plane on Kubernetes nodes (CNI) and (2) IP address management (IPAM) for container IPs.

1. CNI

Most open‑source CNI implementations use a bridge + veth pair (e.g., Flannel, Calico, Cilium, Weave). While veth works on older kernels (3.10.x), the company’s newer nodes (4.19.x, 5.10.x) can use the kernel feature ipvlan. After comparing ipvlan and macvlan, ipvlan was chosen for its simplicity and compatibility with the existing cloud network OVS layer.

ipvlan offers two modes (l2, l3); macvlan provides four modes (bridge, private, vepa, passthru).

ipvlan l2 was selected based on performance, ease of setup, and alignment with the virtualized cloud network.

2. IPAM

Container IPs are allocated from the cloud VPC via UltrON and OpenStack APIs. The CNI must integrate with these APIs to obtain floating IPs that are reachable across the company network. Early attempts using Alibaba Cloud’s open‑source Terway provided basic connectivity but lacked support for specifying VPC, subnet, security groups, and QoS. Consequently, a custom CNI called hulk‑vpc‑cni was developed to meet all requirements.

03 Practical Issues

1. Service Connectivity

After connecting to the VPC, pods need the correct default gateway. Kubernetes Service traffic must be routed through the node’s kube‑proxy rules rather than the subnet gateway, otherwise “no route to host” errors occur. For VPC bare‑metal nodes, OVS tap interfaces were used, and flow rules were added to direct Service traffic to the node itself, ensuring kube‑proxy processing.

Similar flow‑rule adjustments were applied to non‑VPC bare‑metal and virtual machine nodes to achieve Service connectivity.

iptables / nftables

Enabling masquerade‑all in kube‑proxy was necessary; otherwise Service traffic was rewritten to the backend pod IP and bypassed flow rules. On certain OSes, nftables rules took precedence over iptables, requiring adjustments.

2. Slow Pod Startup

Initial implementations required per‑pod VPC subnet allocation, taking about 10 seconds per pod, which is unacceptable for serverless workloads. A resource pool with configurable min/max thresholds was introduced to pre‑allocate resources, dramatically reducing pod startup time.

3. Gateway Sync Latency

When a new pod is created, flow‑rule synchronization to the node can take ~5 seconds, during which the pod cannot reach the network because ARP resolution fails. Adding static ARP entries for the gateway MAC or using the pool logic mitigated this issue.

4. Resource Cleanup

IPAM interacts with UltrON/OpenStack to allocate IPs. If a node or entire cluster is deleted without invoking the CNI delete path, IPs become stale and can exhaust the VPC pool. Cleanup logic was added at both the base‑cluster level (detecting cluster deletion callbacks) and within the CNI controller (periodically reconciling allocated IPs with existing pods).

5. Node Reboot

After a node reboot, VPC network configuration is lost, causing pod connectivity failures. Adding reload logic to re‑apply CNI configuration on node startup resolved the issue.

6. Concurrency

Concurrent pod creation and deletion across many nodes can lead to race conditions in VPC resource operations. The pool implementation isolates resource lifecycle from the pool itself, preventing deadlocks and ensuring correct behavior under high load.

04 Production Rollout

Since early 2022, the solution has been incrementally deployed to multiple business lines. Two major workloads—database clusters and serverless services—have been running on the container VPC for several quarters, demonstrating stability and scalability.

Database Mixed Deployment

Initially deployed on a 20‑node cluster, the solution now runs on dozens of nodes across several data centers, providing a stable foundation for database services.

Serverless

Serverless clusters in multiple regions have provided feedback that helped refine the VPC implementation, driving further improvements.

High‑Performance GPU (hbox)

GPU workloads initially used both VPC and overlay networking; over time, VPC proved more stable and performant, leading to its exclusive adoption.

Operations

Extensive operational documentation, monitoring metrics, and rescue interfaces were built into the CNI to enable rapid incident response. After months of production use, the container VPC meets the standards for hand‑off to dedicated operations teams.

05 Scaling Up

Production validation across database, serverless, hbox, and dedicated clusters confirms that the container VPC is ready for large‑scale rollout.

06 Conclusion

With the company’s “All‑in‑AI” strategy, a high‑performance, reliable, and easy‑to‑use container network provides the foundation for AI workloads, enabling multi‑cluster compute sharing and positioning the infrastructure as a critical enabler for future growth.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes CNI IPAM container networking IPVLAN

Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.