Design and Implementation of a Multi‑Layer Load Balancing Platform (VGW)
The article details how a multi‑layer load‑balancing platform (VGW) was designed—combining 7‑layer Nginx, 4‑layer LVS with FULLNAT, and 3‑layer network devices—to achieve business reliability, fault isolation via BGP‑announced VIPs, and high‑throughput performance using DPDK, while providing redundancy at server, link, and cluster levels.
In large‑scale business scenarios a single server can no longer provide services, which creates a need for reliable load balancing. The article starts from basic requirements and gradually explains how to build a load‑balancing platform that ensures both business reliability and control isolation.
Reliability challenge : With multiple servers handling the same service, two core problems arise – (1) how to decide which server should receive a client request, and (2) how to isolate failed servers. A controller that schedules requests and manages servers becomes a potential bottleneck, so its redundancy and fault isolation must be considered.
Business‑control separation : The solution is to delegate load‑balancing to a dedicated platform. Clients ultimately request an IP address (VIP) rather than a specific backend server, allowing the platform to direct traffic to the appropriate backend.
The article compares three typical load‑balancing methods:
DNS‑based balancing (limited effectiveness).
Nginx (7‑layer) – suitable for HTTP/HTTPS.
LVS (4‑layer) – works at TCP/UDP level.
To improve reliability, a two‑tier scheme is proposed: business servers are protected by Nginx, and Nginx is protected by LVS. This follows the logical chain Business ← 7‑layer (Nginx) ← 4‑layer (LVS) . However, LVS clusters also need reliability, which leads to the idea of adding a third tier at the network layer (3‑layer) using switches/routers that naturally provide load balancing.
The final architecture becomes a three‑tier chain: Business ← 7‑layer (Nginx) ← 4‑layer (LVS) ← 3‑layer (Network Devices) . The network layer is the control plane for the whole stack.
Implementation of 4‑layer load balancing : The article discusses two forwarding modes – Direct Route (DR) and Tunnel. DR relies on MAC‑level forwarding and requires the backend to be in the same subnet, while Tunnel creates a tunnel but adds management overhead. Both have drawbacks, leading to the adoption of a FULLNAT mode, which performs double NAT (client → VIP → LocalIP → backend) and hides the client IP from the backend. FULLNAT offers the best trade‑off for the described scenario.
Health checking : Simple TCP/UDP port probing is used to detect unhealthy backends. For TCP, a successful connection indicates health; for UDP, sending a packet and checking for ICMP “port unreachable” determines failure.
Fault isolation : VIPs are announced via BGP. To isolate a faulty load‑balancer, its VIP announcement is withdrawn, causing upstream switches to stop sending traffic to it.
VGW (vivo Gateway) solution : Building on the FULLNAT 4‑layer design, VGW provides internal‑ and external‑network load balancing. It consists of three core modules – load‑balancing forwarding, health‑check, and routing control. Logical architecture splits external traffic (clients) and internal traffic (real servers). Physical deployment uses a dual‑NIC “dual‑arm” mode for external VGW and a single‑NIC “single‑arm” mode for internal VGW.
Redundancy is achieved at server, link, and cluster levels, with automatic isolation for device failures and manual intervention for non‑fatal anomalies.
Performance optimization leverages DPDK (via DPVS) to bypass the Linux kernel for packet processing, achieving millions of packets per second and high connection‑per‑second rates, which is essential for handling millions of QPS.
In summary, the article walks through the reasoning from business reliability requirements to a concrete, multi‑layer load‑balancing platform, discusses trade‑offs of various designs, and presents the VGW implementation that meets vivo’s large‑scale traffic needs.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.