Operations 17 min read

How HyperRouter Enables Deterministic Operations for L4 Load Balancing

This article explains how Huawei Cloud's HyperRouter implements deterministic operations through a combination of L4/L7 load‑balancing co‑design, high‑performance data‑plane choices, self‑healing mechanisms, point‑to‑point architecture, Cell + Shuffle‑Sharding isolation, and user‑centric observability, providing a reproducible blueprint for reliable cloud services.

Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
How HyperRouter Enables Deterministic Operations for L4 Load Balancing

Deterministic Operations and HyperRouter Positioning

In Huawei Cloud's SRE practice, deterministic operations turn abstract reliability goals into measurable indicators, ensuring predictable fault rates, recovery times, and impact scopes. HyperRouter, a core component of the global load‑balancing service, embodies this approach.

Six Core Practices of HyperRouter Design and Development

1. L4 and L7 Load‑Balancing Co‑Design

The combination of L4 and L7 load balancers mitigates the stateful limitations of L7 (e.g., ECMP hash failures) by letting L4 handle ingress traffic while L7 processes egress, improving reliability and reducing resource overhead. HyperRouter adopts Google Maglev consistent hashing and Direct Server Return (DSR) to achieve stateless, high‑throughput forwarding.

2. Design Goals

High Performance : Target line‑rate processing with millions of packets per second per CPU core and nanosecond‑level latency.

High Scalability : Support rapid growth from tens of millions to billions of concurrent TCP connections.

High Reliability : Ensure near‑zero service interruption during upgrades, configuration changes, hardware or network failures.

Evolvability : Allow seamless adaptation to new scenarios, hardware, and protocols.

3. Architecture of the Data Plane

HyperRouter consists of a data plane and a control plane. The data plane focuses on packet reception, processing, and forwarding with strict performance and reliability requirements, while the control plane handles cluster scheduling, configuration distribution, BGP routing, observability, health checks, and self‑healing with lower performance demands.

4. User‑Space Kernel Bypass Choices

Linux kernel networking introduces latency at high PPS scales. XDP (eXpress Data Path) offers low‑overhead processing but lacks flexibility and dual‑NIC support. DPDK provides near‑line‑rate performance and deterministic processing at the cost of dedicated CPU/NIC resources. HyperRouter chose DPDK as a “one‑way door” decision to meet performance, scalability, and evolvability goals.

5. Self‑Healing and Isolation as First‑Class Citizens

HyperRouter continuously monitors node health (data‑plane metrics, BGP status, health checks) and automatically triggers recovery or isolation actions (process restart, configuration adjustment, node isolation). Instead of withdrawing BGP routes, it adjusts BGP AS paths to reroute traffic, minimizing disruption while avoiding false‑positive routing removals.

6. Point‑to‑Point Decentralized Architecture

To satisfy CAP theory's Availability and Partition tolerance, HyperRouter adopts a peer‑to‑peer design with storage, KV, communication, and application layers. Nodes can discover and sync state independently, ensuring core functionality during network partitions. Formal verification with P language and TLA+ uncovered hidden infinite‑retry bugs, which were fixed before deployment.

Cell + Shuffle‑Sharding for Deterministic Explosion Radius

HyperRouter partitions tenants into Cells and virtual shards. Shuffle Sharding reduces the probability of cross‑tenant interference, shrinking the fault explosion radius from 1/5 (static partition) to 1/45 (combinatorial calculation), thereby guaranteeing deterministic impact limits in multi‑tenant environments.

Critical User Journeys (CUJ) for User‑Centric Observability

Traditional metrics (CPU, packet loss) miss user experience. HyperRouter defines CUJs—e.g., the end‑to‑end flow of creating a load balancer (API call → control‑plane processing → BGP route publication → data‑plane integration)—to bind observability to user actions, exposing latency sources such as DPDK internal delays that are invisible to system‑level monitoring.

Core Experience Summary

HyperRouter demonstrates that reliable cloud services stem from clear architecture, rational technology trade‑offs, and closed‑loop feedback mechanisms rather than mere redundancy. Future work will continue to optimize performance, adopt new protocols, and extend deterministic operations across the cloud lifecycle.

cloud-nativeobservabilityLoad BalancingSREDPDKSelf-healingdeterministic operations
Huawei Cloud Developer Alliance
Written by

Huawei Cloud Developer Alliance

The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.