How Kernel Innovations Enable Scalable Multi‑Tenant Cloud Gateways
UCloud’s external gateway redesign leverages Linux kernel 5.0 features—lightweight tunneling, VRF, and flow offload—to replace OVS/GRE tunnels, addressing IP tunnel complexity, tenant isolation overhead, and netns performance penalties, while contributing patches back to the open‑source community.
Existing kernel 3.x shortcomings
Commonly used kernel 3.x series (e.g., CentOS 7's 3.10) suffers from complex IP tunnel management, performance loss due to tenant isolation, and netns overhead, making large‑scale deployments difficult.
1. IP tunnel management complexity
Creating point‑to‑point IP tunnels requires specifying destination and key for each tunnel; with many hosts this leads to thousands of tunnel devices, which become hard to manage.
2. Performance degradation from multi‑tenant isolation
Overlapping VPC address spaces require extensive policy routing and iptables/NAT chains, causing performance to drop sharply as chain length grows.
3. Netns overhead
Using network namespaces for tenant isolation adds virtual NICs and protocol stack re‑entry costs, reducing overall performance by about 20%.
Three kernel new technologies
To overcome these issues, three upstream Linux kernel features were evaluated: Lightweight tunneling (lwtunnel), Virtual Routing Forwarding (VRF), and nftables flow offload.
1. Lightweight tunneling
Introduced in kernel 4.3, lwtunnel allows tunnel attributes to be set via routing, eliminating the need for numerous tunnel devices.
2. Virtual Routing Forwarding (VRF)
Supported since kernel 4.3 (complete in 4.8), VRF enables a single Linux box to act as multiple virtual routers, providing clean tenant routing isolation without policy routing.
3. Flow offload
Nftables replaces legacy iptables, and its flow offload (kernel 4.16) offloads established connections to the NIC, bypassing the IP stack for subsequent packets. Future hardware offload will further boost performance.
Design and optimization practice
By combining lwtunnel, VRF, and flow offload, a routing‑based multi‑tenant overlay gateway was designed. Several bugs were encountered and fixed via patches contributed to the Linux kernel.
1. lwtunnel packet key loss
Problem: Packets sent through an external gretap tunnel lacked the tunnel_key field.
Root cause: The TUNNEL_KEY flag is not exposed to userspace, so iproute2 cannot set it.
Patches:
iptunnel: make TUNNEL_FLAGS available in uapi
iproute: set ip/ip6 lwtunnel flags
After applying the patches, the route can be set as follows:
ip r add 2.2.2.11 via 1.1.1.11 dev tun encap ip id 1000 dst 172.168.0.1 key2. lwtunnel key‑based IP tunnel ineffective
Problem: A tunnel_key‑based gretap device could receive but not send packets.
Root cause: In non‑external mode the kernel ignored the lightweight tunnel route.
Patch: ip_tunnel: Make none‑tunnel‑dst tunnel port work with lwtunnel
3. External IP tunnel ARP missing tunnel_key
Problem: ARP replies did not carry the tunnel_key.
Root cause: Tunnel metadata was copied without tun_flags.
Patch: iptunnel: Set tun_flags in the iptunnel_metadata_reply from src
4. Flow offload incompatibility with DNAT
Problem: DNATed traffic (2.2.2.11 → 10.0.0.7) prevented flow offload.
Root cause: Reverse‑direction lookup used the original destination address.
Patch: netfilter: nft_flow_offload: Fix reverse route lookup
5. Flow offload incompatibility with VRF
Problem: Adding interfaces to a VRF disabled flow offload.
Root cause: Offload rules were attached to the physical NICs, while packets were processed on the VRF device.
Patch: netfilter: nft_flow_offload: fix interaction with vrf slave device
6. VRF PREROUTING hook re‑entry issue
Problem: Packets entering a VRF were processed twice in PREROUTING, causing rule conflicts.
Root cause: After the first PREROUTING, the VRF device re‑enters the hook, applying egress rules incorrectly.
Patch: netfilter: nft_meta: Add NFT_META_I/OIFKIND meta type (plus corresponding userspace changes)
Usage example:
nft add rule firewall rules-all meta iifkind "vrf" counter acceptPrototype verification
A proof‑of‑concept environment was built with netns namespaces representing external clients and tenant networks, using the new lwtunnel, VRF, and flow offload features. The setup demonstrated correct inbound/outbound traffic isolation per tenant and successful offload operation.
Conclusion
The combination of lightweight tunneling, VRF, and flow offload enables a high‑performance, multi‑tenant external gateway suitable for cloud environments. The contributed patches are available in Linux kernel 5.0, and further work will focus on hardware offload support and broader NIC vendor adoption.
UCloud Tech
UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
