Cloud Computing 10 min read

How Multi‑Tunnel Architecture Resolved Physical Cloud Traffic Overload

This article details how UCloud tackled severe traffic overload in its physical cloud gateway caused by hash polarization, introducing a multi‑tunnel solution, capacity management, isolation‑zone migration, and automated operations to achieve high availability and support hundreds of gigabits of traffic.

UCloud Tech
UCloud Tech
UCloud Tech
How Multi‑Tunnel Architecture Resolved Physical Cloud Traffic Overload

Background

UCloud's physical cloud hosts are dedicated servers offering high compute performance for core applications. Physical cloud gateways enable internal communication between physical and public cloud products, handling massive cross‑region, cross‑cluster traffic.

Problem: Hash Polarization and Overload

Monitoring revealed that gateway device e in cluster 2 was overloaded while other devices were underutilized, with most traffic originating from cluster 1. The root cause was hash polarization: a single tunnel encapsulated traffic, causing the hash algorithm to produce identical results and concentrate load on a few devices, leading to overload.

Cross‑cluster tunnel encapsulation diagram
Cross‑cluster tunnel encapsulation diagram

Solution 1: Multi‑Tunnel Approach

To break the single‑tunnel limitation, each gateway now binds a range of tunnel IPs. Traffic is hashed based on inner packet information, and a tunnel SIP/DIP is selected from the pre‑allocated range, distributing flows across multiple tunnels and effectively scattering traffic.

Multi‑tunnel solution diagram
Multi‑tunnel solution diagram

Preventing "Elephant Flow"

When a single user generates massive traffic, even multiple tunnels may be insufficient. UCloud mitigates this by increasing gateway capacity and employing isolation‑zone lossless migration, which automatically redirects excess traffic to isolated zones and validates migration results with strong checks.

Isolation zone lossless migration diagram
Isolation zone lossless migration diagram

Capacity Management and Isolation Zone

Gateways are provisioned with bandwidth exceeding that of physical cloud hosts (e.g., increasing per‑node capacity from 10 G to 25 G) to absorb sudden spikes. The isolation zone, normally traffic‑free, can absorb overflow when monitoring detects risk of overload.

High Availability Upgrade

During upgrades, a gray‑deployment strategy is used: a new cluster is deployed, traffic is gradually migrated, and if issues arise, services can be rolled back to the old cluster. This reduces impact scope and ensures continuity.

New manager taking over gray cluster
New manager taking over gray cluster

Risk Analysis and Automation

Human‑driven deployments increase fault probability.

Insufficient program exception handling.

Inadequate isolation between clusters.

To address these, UCloud introduced automated operations separating configuration storage and deployment, enhanced validation and alerting (e.g., whitelist filtering before loading configurations), and isolation mechanisms to limit the impact of a faulty manager.

Whitelist validation program
Whitelist validation program

Conclusion

The experience shows that tackling traffic overload requires both architectural changes—such as multi‑tunnel designs—and operational improvements like capacity planning, lossless migration, and automation. Ultimately, all technical solutions serve the business goal of reliable, high‑performance cloud services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

capacity managementNetwork Traffichash polarizationmulti-tunnel
UCloud Tech
Written by

UCloud Tech

UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.