Cloud Native 19 min read

How Alibaba Cloud Designs High‑Performance Cloud‑Native Container Networks with Terway

This article reviews the SIG Cloud‑Provider‑Alibaba webinar, explains Kubernetes container networking fundamentals, details Alibaba Cloud's high‑performance CNI implementation using Terway with exclusive and shared ENI modes, and shows how eBPF, AutoPath DNS and resource‑pooling improve scalability and latency.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How Alibaba Cloud Designs High‑Performance Cloud‑Native Container Networks with Terway

Overview of the SIG Cloud‑Provider‑Alibaba Webinar

On April 16, the second SIG Cloud‑Provider‑Alibaba live session was held, focusing on how Alibaba Cloud designs and builds high‑performance cloud‑native container networks. The article provides a video replay, downloadable materials, and a curated Q&A.

Kubernetes Container Network Basics

Kubernetes uses the Container Network Interface (CNI) to provide each Pod with an independent network namespace and IP address, enabling direct Pod‑to‑Pod, Pod‑to‑Node, and Pod‑to‑external communication. Service objects allocate stable IPs and perform load‑balancing, while CoreDNS translates Service names to IPs.

Pod network connectivity (CNI)

Kubernetes Service (load balancing)

CoreDNS (service discovery)

What Is CNI?

CNI is the API that network plugins implement to configure container networking. Common plugins include Terway, Flannel, and Calico.

Pod Creation Process

Kubelet receives a Pod creation event from the API server and creates a sandbox.

The CNI plugin is invoked to set up the container’s network.

CNI configures the Pod’s network namespace and routes.

Designing High‑Performance Cloud‑Native CNI Networks

Traditional overlay networks add significant overhead. Alibaba Cloud’s approach treats the container network as a first‑class citizen in the VPC, achieving near‑bare‑metal performance.

Pods share the same network plane as VMs.

Pod networks integrate seamlessly with cloud resources.

No encapsulation or routing overhead; performance matches that of virtual machines.

Resource Allocation via Cloud APIs

When a Pod is created, Terway calls cloud OpenAPI to allocate resources such as Elastic Network Interfaces (ENIs) and secondary IPs, then configures them inside the Pod’s sandbox.

Resource Pooling for Fast Scaling

Terway maintains a pool of allocated and idle resources. When a Pod terminates, its resources stay in the pool for quick reuse. The pool has low‑water and high‑water marks to trigger pre‑warming or release of ENIs via API calls.

Exclusive ENI Mode

Terway binds an ENI to the node where the Pod runs.

The ENI is moved into the Pod’s network namespace.

IP and routing are configured directly for the Pod.

Advantages: zero host‑stack traversal, performance equal to ECS instances, and support for DPDK acceleration.

Shared ENI Mode (IPVLAN)

Terway decides whether to allocate a new ENI or use secondary IPs based on request size.

An IPVLAN sub‑interface is created on the ENI.

The IPVLAN interface is moved into the Pod’s namespace.

IP and routing are set inside the Pod.

Advantages: minimal packet processing, each ENI supports 10‑20 secondary IPs, and lower latency.

Performance Comparison

Benchmarks show that both exclusive ENI and shared ENI modes outperform Flannel VXLAN in TCP_RR, UDP, PPS, bandwidth, and latency. Exclusive ENI can saturate the node’s network capacity, making it suitable for high‑performance computing and gaming workloads.

Improving Service and NetworkPolicy Scalability with eBPF

Traditional kube‑proxy (iptables) and ipvs suffer from linear rule matching and slow updates as Service counts grow. By injecting eBPF programs via tc on each node’s NIC, Service load‑balancing and NetworkPolicy enforcement are moved to the kernel data plane, reducing latency and improving scalability.

Terway integrates with Cilium (eBPF agent) to implement these optimizations ( https://github.com/cilium/cilium/pull/10251).

Optimizing DNS with AutoPath and Node‑Local DNS

CoreDNS performs multiple namespace searches for each query, causing high latency and failure rates. AutoPath reduces the number of queries by watching Pod and Service objects and caching results, cutting DNS requests by 75%.

Node‑local DNS caches responses on each node and forwards external queries to the cloud’s PrivateZone, providing TCP‑based reliability and reducing load on the central CoreDNS.

Q&A Highlights

Pod network namespace creation: veth pairs are used on older kernels; IPVLAN is used on newer kernels (4.19+).

Security auditing of dynamic Pod IPs: IPs are stable during a Pod’s lifecycle and can be correlated via Kubernetes events or Terway’s label‑based updates.

Kernel requirements: IPVLAN and eBPF need kernel 4.19+; fallback to veth + policy routing is available.

eBPF deployment overhead: adds only a few hundred milliseconds.

IPv6 support: currently only IPv4; IPv6 LoadBalancer is supported via 6to4 translation, with native dual‑stack planned.

Service mesh roadmap: Alibaba Cloud offers ASM and plans to improve usability, performance, and global edge connectivity.

References

Terway source code:

https://github.com/AliyunContainerService/terway
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKuberneteseBPFnetwork performanceCNITerway
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.