Cloud Computing 13 min read

How Alibaba Cloud’s High‑Performance Load Balancer Achieves Scalability and High Availability

This article explains the evolution of Alibaba Cloud's high‑performance load balancing, covering early LVS modes, Tengine integration, architectural improvements, and multi‑region high‑availability designs that together deliver massive throughput, fault tolerance, and flexible deployment for cloud services.

21CTO
21CTO
21CTO
How Alibaba Cloud’s High‑Performance Load Balancer Achieves Scalability and High Availability

Load Balancing

Load balancing is a fundamental cloud component that serves as the entry point for network traffic, distributing requests across multiple backend servers according to various algorithms.

What Is Load Balancing?

Incoming traffic is evenly split among backend servers so each can respond independently, achieving load distribution. Models include global load balancing (typically DNS‑based) and intra‑cluster load balancing, as well as hardware versus software solutions.

Global vs. Intra‑Cluster Load Balancing

Global load balancing uses DNS to resolve a domain to different VIPs for region‑level routing. Hardware appliances (e.g., F5, A10) offer strong performance but limited flexibility and higher cost. Software solutions such as LVS, Nginx, and HAProxy provide customizable, cost‑effective alternatives.

LVS Modes

LVS originally supports three modes: DR, TUN, and NAT.

DR mode rewrites MAC addresses, limiting flexibility in large distributed clusters.

TUN mode encapsulates IP packets, preserving the client’s source IP but requiring unpacking modules on backend servers.

NAT mode performs DNAT on the destination IP, requiring routing adjustments on backend servers.

Netfilter Framework

LVS is built on Linux Netfilter, a flexible platform for developing network functions. Early implementations struggled with multi‑core scalability due to global locks and cache contention.

Improvements to LVS

Introduced FullNAT (adds SNAT) to improve flexibility.

Parallelized processing to leverage multi‑core CPUs.

Implemented fast‑path forwarding for subsequent packets after the first packet’s slow‑path handling.

Utilized Intel‑specific instructions and NUMA‑aware memory access.

Tengine

Tengine combines LVS and a 7‑layer reverse‑proxy, handling both TCP/UDP and HTTP(S) traffic. Performance bottlenecks were addressed by optimizing the kernel stack, adopting the high‑performance Alisocket (DPDK‑based) TCP stack, and using hardware SSL acceleration.

Kernel‑level protocol stack optimizations.

Alisocket for localized packet processing across cores.

Hardware SSL offload and session reuse.

Web‑layer transmission optimizations.

High Availability Architecture

Group

Each region contains multiple data centers, each with several scheduling units and multiple LVS/Tengine devices, providing redundancy at the server, switch, and network levels.

AZ (Availability Zone)

Two routers per data center enable seamless failover between AZs; VIPs are assigned different priorities to allow second‑level automatic switching.

Region

DNS can resolve a domain to multiple region VIPs, allowing traffic to be redirected to a healthy region if a data center fails; health checks automatically remove unhealthy devices.

Summary

Alibaba Cloud’s high‑performance load balancer is used as a core component for public‑cloud websites, e‑commerce platforms, financial services, and internal cloud products, offering massive throughput (up to 4000 Mpps per LVS), multi‑region high availability, and flexible scaling. Future goals include better elastic expansion, higher single‑node capacity, proactive VIP probing, and end‑to‑end network monitoring.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud computinghigh availabilityload balancingAlibaba CloudLVSTengine
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.