How Alibaba Cloud’s High‑Performance Load Balancer Achieves Scalability and High Availability
This article explains the evolution of Alibaba Cloud's high‑performance load balancing, covering early LVS modes, Tengine integration, architectural improvements, and multi‑region high‑availability designs that together deliver massive throughput, fault tolerance, and flexible deployment for cloud services.
Load Balancing
Load balancing is a fundamental cloud component that serves as the entry point for network traffic, distributing requests across multiple backend servers according to various algorithms.
What Is Load Balancing?
Incoming traffic is evenly split among backend servers so each can respond independently, achieving load distribution. Models include global load balancing (typically DNS‑based) and intra‑cluster load balancing, as well as hardware versus software solutions.
Global vs. Intra‑Cluster Load Balancing
Global load balancing uses DNS to resolve a domain to different VIPs for region‑level routing. Hardware appliances (e.g., F5, A10) offer strong performance but limited flexibility and higher cost. Software solutions such as LVS, Nginx, and HAProxy provide customizable, cost‑effective alternatives.
LVS Modes
LVS originally supports three modes: DR, TUN, and NAT.
DR mode rewrites MAC addresses, limiting flexibility in large distributed clusters.
TUN mode encapsulates IP packets, preserving the client’s source IP but requiring unpacking modules on backend servers.
NAT mode performs DNAT on the destination IP, requiring routing adjustments on backend servers.
Netfilter Framework
LVS is built on Linux Netfilter, a flexible platform for developing network functions. Early implementations struggled with multi‑core scalability due to global locks and cache contention.
Improvements to LVS
Introduced FullNAT (adds SNAT) to improve flexibility.
Parallelized processing to leverage multi‑core CPUs.
Implemented fast‑path forwarding for subsequent packets after the first packet’s slow‑path handling.
Utilized Intel‑specific instructions and NUMA‑aware memory access.
Tengine
Tengine combines LVS and a 7‑layer reverse‑proxy, handling both TCP/UDP and HTTP(S) traffic. Performance bottlenecks were addressed by optimizing the kernel stack, adopting the high‑performance Alisocket (DPDK‑based) TCP stack, and using hardware SSL acceleration.
Kernel‑level protocol stack optimizations.
Alisocket for localized packet processing across cores.
Hardware SSL offload and session reuse.
Web‑layer transmission optimizations.
High Availability Architecture
Group
Each region contains multiple data centers, each with several scheduling units and multiple LVS/Tengine devices, providing redundancy at the server, switch, and network levels.
AZ (Availability Zone)
Two routers per data center enable seamless failover between AZs; VIPs are assigned different priorities to allow second‑level automatic switching.
Region
DNS can resolve a domain to multiple region VIPs, allowing traffic to be redirected to a healthy region if a data center fails; health checks automatically remove unhealthy devices.
Summary
Alibaba Cloud’s high‑performance load balancer is used as a core component for public‑cloud websites, e‑commerce platforms, financial services, and internal cloud products, offering massive throughput (up to 4000 Mpps per LVS), multi‑region high availability, and flexible scaling. Future goals include better elastic expansion, higher single‑node capacity, proactive VIP probing, and end‑to‑end network monitoring.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
