Cloud Native 13 min read

Why Overlay Networks Matter: Deep Dive into Flannel’s UDP, VXLAN, and Host‑gw Modes

This article explains why Kubernetes requires an overlay network, describes Docker’s virtual bridge limitations, introduces Flannel’s architecture and its three data‑forwarding modes—UDP, VXLAN, and host‑gw—detailing their mechanisms, performance trade‑offs, and how the StarRing TCOS platform leverages these modes for different cluster sizes and network topologies.

StarRing Big Data Open Lab
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Why Overlay Networks Matter: Deep Dive into Flannel’s UDP, VXLAN, and Host‑gw Modes

Kubernetes requires that any pod can communicate with any other pod without using NAT, meaning the entire cluster must be fully connected. Docker’s namespace isolation prevents direct container‑to‑container communication and forces containers to share the host’s network stack, leading to port exhaustion, security risks, and scalability limits.

Why an Overlay Network Is Needed

To maintain isolation while enabling container‑to‑container networking, Docker creates a virtual bridge (docker0) that connects each container’s virtual Ethernet device to the bridge. Containers on the same host can communicate via ARP broadcasts over this bridge, but each node’s docker0 operates independently, causing IP conflicts across nodes. An overlay network provides a global view that enables cross‑node container communication.

Flannel: Early Overlay Network Solution

Flannel, introduced by CoreOS, assigns globally unique container IPs and forwards traffic across nodes. Its components include a per‑node agent flanneld, a key‑value store (etcd or the Kubernetes API) that maps hosts to subnets, and various backend implementations. The three most common backends are UDP, VXLAN, and host‑gw.

Flannel UDP Mode

UDP mode adds a TUN device (flannel0) on each host. Container packets are sent to docker0, then to flannel0, where the flanneld process encapsulates them in UDP and forwards them to the destination host. The destination host decapsulates the packet and delivers it to the target container. Because packets traverse user space multiple times (container → kernel → user space → kernel → network), UDP incurs significant copy overhead and is unsuitable for high‑throughput workloads.

Flannel VXLAN Mode

VXLAN creates a virtual Layer‑2 network on top of the existing Layer‑3 infrastructure. Each node runs a VTEP device (flannel.1) that encapsulates container frames in VXLAN headers. Routing tables maintained by flanneld map destination subnets to remote VTEP IPs, and an FDB maps remote VTEP MACs to host IPs. Encapsulation and decapsulation occur entirely in the kernel, resulting in much lower latency and higher throughput than UDP.

Flannel Host‑gw Mode

Host‑gw is a pure Layer‑3 solution. Each host maintains a routing table that points to the next‑hop IP for any remote container subnet. Packets are sent directly to the destination host’s MAC address and then delivered to the target container, eliminating encapsulation overhead. This mode offers performance comparable to bare‑metal VM communication but requires all nodes to be on the same physical Layer‑2 network and imposes routing‑table maintenance overhead on large clusters.

Performance Test on StarRing TCOS

Using IPerf and netperf, the UDP mode was found to be slower and is now largely deprecated. In a two‑node test on a flat Layer‑2 network, host‑gw achieved roughly 20% higher bandwidth and about 5% higher throughput than VXLAN, though both converged at higher TCP window sizes. Consequently, for small clusters on Layer‑2 networks, host‑gw is preferred, while VXLAN is better suited for large, multi‑subnet or cross‑datacenter deployments.

StarRing TCOS defaults to VXLAN for complex, multi‑subnet environments but retains host‑gw for small‑scale or flat‑network scenarios. TCOS also adds a Network Policy layer on top of Flannel, enabling fine‑grained firewall rules for Pods, Services, and Namespaces, and supports multi‑tenant isolation.

Evolution of Container Network Orchestration

Flannel’s simplicity made it a common baseline for many cloud providers, but its lack of Network Policy and advanced features spurred the development of richer CNI plugins such as Calico and Weave. These newer solutions build on Flannel’s concepts while adding policy enforcement, load balancing, and higher performance, marking the diversification of container networking in the Cloud Native ecosystem.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformanceKubernetesOverlay NetworkUDPContainer NetworkingflannelVXLANhost-gw
StarRing Big Data Open Lab
Written by

StarRing Big Data Open Lab

Focused on big data technology research, exploring the Big Data era | [email protected]

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.