Design and Implementation of a Multi‑Layer Load‑Balancing Platform (VGW)
This article explains the need for reliable load balancing in large‑scale services, analyzes the problems of request distribution and fault isolation, and details the design of a three‑layer and four‑layer load‑balancing architecture—including DNS, Nginx, LVS, FULLNAT, and VGW—along with health‑check, redundancy, and performance optimization techniques.
In large‑scale business scenarios a single machine can no longer provide services, which creates a demand for reliable load balancing. Starting from basic requirements, the article gradually explains how to build a load‑balancing platform.
Reliability Problem : When ten servers provide the same service, two core issues arise—how to assign client requests to the appropriate server and how to isolate faulty servers. Poor handling leads to server starvation or overload and violates the availability aspect of the CAP theorem.
To solve these issues a controller that schedules requests and manages servers is required, but such a controller often becomes a bottleneck. Therefore, the control plane should be separated from the data plane.
Business and Control Isolation : The solution is to delegate the problem to a dedicated load‑balancing platform. Client requests are directed to a Virtual IP (VIP); the load balancer then decides which backend server receives the traffic. DNS can provide coarse load distribution, while Nginx (layer‑7) and LVS (layer‑4) offer finer control.
Both DNS and Nginx/LVS can achieve load balancing, but they must be deployed as clusters for reliability. DNS offers limited balancing, Nginx works at layer‑7 using domain names, and LVS works at layer‑4 using IP/port.
Three‑Layer Load Balancing : By leveraging network devices (switches/routers) that naturally provide layer‑3 load balancing, the full chain becomes: Business ← layer‑7 (Nginx) ← layer‑4 (LVS) ← layer‑3 (Network Devices).
Four‑Layer Load Balancing focuses on layer‑4. Four key questions are addressed: attracting traffic to the balancer, selecting a backend, forwarding data, and handling backend responses. Solutions include using a Virtual IP, simple round‑robin or weighted round‑robin scheduling, and two forwarding modes—Direct Route (DR) and Tunnel.
DR requires the balancer and backends to be in the same subnet, while Tunnel creates a tunnel between them, both avoiding response traffic through the balancer.
The preferred approach is FULLNAT , which performs two address translations: client → VIP → LocalIP → backend and backend → LocalIP → VIP → client. This hides the balancer from backends and requires no backend configuration, though the real client IP is hidden.
Health checks are performed by probing TCP/UDP ports of backends; successful connection indicates health, while failure leads to removal from the pool.
Fault isolation for the balancers themselves relies on BGP: each balancer advertises the same VIP; withdrawing the VIP from a failing balancer removes it from the routing path.
The final solution, called VGW (vivo Gateway) , provides layer‑4 load balancing for both internal and external traffic. Its components include a load‑balancing forwarding module, a health‑check module, and a routing‑control module.
The logical architecture forwards external requests to internal real servers; the physical architecture uses a dual‑NIC “dual‑arm” design for external VGW and a single‑NIC “single‑arm” design for internal VGW.
Redundancy is achieved at server, link, and cluster levels, with automatic VIP withdrawal for failed devices and manual intervention for complex anomalies.
Performance is boosted by using DPDK (via DPVS) to bypass kernel processing, achieve millions of packets per second, and handle over a million new connections per second on 100 Gbps NICs.
In summary, the article derives a practical load‑balancing solution from reliability requirements, implements it as the VGW platform, and discusses future challenges such as new protocols and decentralized data‑center models.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.