High‑Performance Load Balancing Design and Implementation Using LVS and Tengine
This article reviews Alibaba Cloud's high‑performance load‑balancing solution, explaining the evolution from basic load‑balancing concepts to the architecture of LVS and Tengine, detailing their modes, optimizations, high‑availability designs across groups, AZs and regions, and outlining current use cases and future directions.
Load Balancing
Load balancing is a fundamental cloud‑computing component that distributes incoming traffic across multiple backend servers using various algorithms, supporting both global (DNS‑based) and intra‑cluster models, as well as hardware and software implementations such as F5, LVS, Nginx, and Haproxy.
LVS
LVS originally supports three modes—DR, TUN, and NAT—each with specific IP‑address handling and deployment constraints. It is built on the Linux Netfilter framework, which originally lacked strong multi‑core support.
Improvements include FullNAT (adding SNAT), parallel processing with RSS to bind flows to specific CPUs, fast‑path optimizations, instruction‑level enhancements, and NUMA‑aware memory locality, achieving up to 40 Mpps, 600 Kcps per node and linear scalability across many cores.
Tengine
Tengine handles layer‑7 traffic and faces performance challenges as CPU count grows; optimizations involve kernel‑level TCP stack tuning, the proprietary Alisocket (DPDK‑based) stack, hardware SSL offload, and web‑layer enhancements.
Elastic scaling is achieved by deploying Tengine instances in VMs, using health checks for failover, and supporting advanced features such as cookie‑based session persistence, URL routing, HTTP/2, and WebSocket, with a single VIP capable of 100 K HTTPS QPS.
High Availability
Group architecture provides full‑mesh redundancy with dual‑homed servers, multi‑region clusters, and automatic failover for servers, NICs, switches, and routing, delivering up to 640 Gbps aggregate throughput and seamless, user‑transparent upgrades.
AZ design duplicates routers across availability zones, enabling sub‑second failover without session sync, while Region design uses DNS‑based multi‑region VIPs and health‑checked LVS/Tengine instances to maintain service continuity across data centers.
Summary
The high‑performance load‑balancing solution powers public‑cloud front‑ends for e‑commerce, finance, and government, supports internal Alibaba Cloud services (RDS, OSS, DDoS protection), and serves as the traffic entry for platforms like Taobao and Alipay. Future work focuses on greater elasticity, higher single‑node capacity, proactive VIP probing, and end‑to‑end network monitoring.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.