Operations 14 min read

How to Scale Web Services with Load Balancing and High‑Availability Clusters

This article explains why vertical scaling reaches limits, compares the pros and cons of vertical versus horizontal scaling, and provides detailed guidance on load‑balancing, high‑availability, and high‑performance clusters, including implementation methods, key terminology, and model-specific rules.

MaGe Linux Operations

Dec 4, 2015

How to Scale Web Services with Load Balancing and High‑Availability Clusters

Background

With the rapid increase in Internet traffic, a single server can no longer meet demand, so service capacity must be improved by vertical scaling and horizontal scaling .

Disadvantages of vertical scaling:

High cost.

Performance eventually hits a bottleneck, causing degradation.

Advantages of horizontal scaling:

Low cost.

Provides high concurrency and high availability.

Good scalability.

Classification

Load‑balancing cluster (Load Balance)

High‑availability cluster (High Availability Cluster)

High‑performance cluster (High performance computing)

Load‑balancing cluster : Because DNS cannot be used for load balancing due to telecom operators, a front‑end scheduler distributes requests to back‑end servers to increase concurrent access. As traffic grows, the scheduler and storage become bottlenecks, so functional partitioning and clustering (e.g., portal site categories) are needed. Static content is synchronized with rsync+inotify. The scheduler performs health checks, removing failed hosts and adding recovered ones.

Advantages : Improves concurrent processing capability.

High‑availability cluster : Improves service uptime. For example, two hosts A (active) and B (standby) exchange heartbeat via multicast; if B loses A's heartbeat, it powers off A and takes over its IP, ensuring continuous service. Multiple hosts can set backup node priorities for failover actions.

High‑availability clusters also transmit transaction information (e.g., priorities). The node coordinating transactions is called DC; if DC fails, other hosts elect a new one.

Main differences between load‑balancing and high‑availability clusters :

Load‑balancing clusters provide concurrency and health checks.

High‑availability clusters provide continuous online service and heartbeats.

High‑performance clusters use distributed file systems to parallelize complex tasks.

High‑availability clusters may waste resources because standby nodes are idle; separating services (e.g., web and mail) on different hosts can improve efficiency.

Split‑brain : Occurs when a cluster node temporarily stops responding, causing other nodes to assume it is dead and seize shared disk access, potentially corrupting the shared file system.

STONITH : “Shoot the other node in the head” – power‑off a node that loses heartbeat via a power‑switch.

Isolation : Fencing – denying a node access to a resource, including node isolation (STONITH) and resource isolation.

To prevent split‑brain, a cluster should have an odd number (3 or more) of nodes.

Other Knowledge

DAS : Direct‑Attached Storage. The kernel accesses block devices directly. Concurrent access by different hosts can cause write errors, but performance is high.

NAS : Network‑Attached Storage. Operates at the file level; the first host locks the file, preventing others from writing. Performance is lower than DAS.

Load‑Balancing Cluster Implementation Methods

1. Hardware method

F5, CITRX, NETSCALER, A10 (price decreasing; a backup device is needed to avoid a single point of failure, increasing cost).

2. Software method

Layer 4: LVS – distributes based on IP and port, high performance but limited advanced feature support.

Layer 7 (reverse proxy): Nginx (http, smtp, pop3, imap) and HAProxy (http, tcp such as MySQL, smtp) – can decode protocols precisely, modify requests, and forward them; slightly slower than LVS but more suitable for production.

LVS: Linux Virtual Server

LVS works in the kernel TCP/IP input chain and cannot coexist with iptables. After defining a cluster service on the scheduler, incoming packets are examined; if they match a cluster service, they are forwarded to the forward chain and then post‑routing.

LVS consists of two parts: ipvsadm (user space) and ipvs (kernel).

Before kernel 2.4.23, ipvs code was absent and required patches.

Related Terms

LVS types:

1. NAT model

2. DR model

3. TUN model

NAT model: works like DNAT.

When a client request reaches the scheduler, the IP header is CIP|VIP; after prerouting and input detection, the packet is forwarded to forward and postrouting, changing the header to CIP|RIP1. The real server sees its own address, processes the request, and the response header becomes RIP1|CIP. The scheduler then performs source NAT, changing the header to VIP|CIP before sending back to the client.

Rules for NAT model:

Cluster nodes and scheduler must be on the same network.

RIP addresses are private, used only for inter‑node communication.

The scheduler sits between client and real server, mirroring all traffic.

Real server gateway must point to DIP.

Scheduler supports port mapping.

Real servers can run any OS.

In large‑scale scenarios, the scheduler can become a bottleneck; typically it handles up to 10 back‑end hosts.

DR model (commonly used):

Scheduler and real servers connect to a switch; both have VIPs (real server's VIP is hidden). The scheduler’s VIP is on its NIC, DIP on an alias; real server’s RIP is on its NIC, VIP on an alias. All are in the same network, so MAC addresses are resolved via ARP. When a client packet (CIP|VIP) arrives, the scheduler forwards it to a real server without altering the IP header, only changing MAC addresses. The real server responds with VIP as source, CIP as destination, and the packet goes directly to the client.

Rules for DR model:

Cluster nodes must share the same physical network with the scheduler.

RIP no longer needs to be private; remote management is easier.

Scheduler only handles inbound requests; responses are sent directly from real servers.

Cluster nodes cannot set their gateway to DIP.

Scheduler does not support port mapping.

Most operating systems can be used as real servers because VIP is hidden.

TUN model (for remote disaster recovery, rarely used):

Works similarly to DR model. Real servers have two IPs: RIP (public) and hidden VIP. Scheduler has VIP and DIP (alias). When a client packet (CIP|VIP) reaches the scheduler, it adds a outer header DIP|RIP before forwarding. The real server strips the outer header, processes the request, and sends the response directly to the client, bypassing the scheduler. This requires tunnel support on both scheduler and real server.

Rules for TUN model:

Cluster nodes can span the Internet.

RIP must be a public address.

Scheduler only handles inbound traffic; responses are sent directly by real servers.

Real server gateway cannot point to the scheduler.

Only OSes that support tunneling can be used as real servers.

Port mapping is not supported.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LVS server scaling

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.