Mastering Load Balancing: Which Strategy Suits Your System?
This article explains the role of load balancing in distributed systems and details five common strategies—Round Robin, Weighted Round Robin, IP Hash, Least Connections, and Least Response Time—highlighting their mechanisms, advantages, drawbacks, and suitable scenarios for optimal system performance.
In internet scenarios, load balancing is a crucial component of distributed system architecture, referring to the process of distributing workload or request traffic across multiple servers or components to improve performance, high availability, and horizontal scalability.
As traffic grows for sites like Taobao and JD.com, a single server or cluster can no longer meet demand, necessitating horizontal scaling and a load‑balancing component to manage traffic distribution.
2.1 Round Robin
Round Robin (RR) assigns incoming requests to servers in order, cycling through them. It works well when servers have similar performance, but a weak or failing server can affect overall stability.
Five requests arrive.
Requests are assigned sequentially: web‑server1 gets 1 and 4, web‑server2 gets 2 and 5, web‑server3 gets 3.
2.2 Weighted Round Robin
Weighted Round Robin gives each server a weight and distributes requests proportionally, allowing more capable servers to handle more traffic.
Five requests arrive.
web‑server1 (weight 60%) receives requests 1, 2, 3.
web‑server2 (weight 20%) receives request 4.
web‑server3 (weight 20%) receives request 5.
2.3 IP Hash
IP Hash computes a hash from the client’s IP address and routes the request to a specific server, ensuring that the same IP consistently reaches the same server—useful for session persistence.
IP 192.168.0.99 hashes to web‑service1, so requests 1 and 4 go to server 1.
IPs 192.168.0.96 and 192.168.0.98 hash to web‑service3, so requests 2 and 3 go to server 3.
While IP Hash guarantees session affinity, it can cause imbalance if a single IP generates heavy traffic, potentially overloading its assigned server.
2.4 Least Connections
Least Connections directs each request to the server with the fewest active connections, making it suitable for long‑lived connections such as WebSocket or FTP.
Current connections: web‑service1=11, web‑service2=15, web‑service3=2.
Incoming requests are routed to web‑service3, the least‑loaded server.
The algorithm adapts well to heterogeneous server capacities but adds overhead for real‑time connection monitoring.
2.5 Least Response Time
Least Response Time selects the server with the shortest current response time, ideal for latency‑sensitive applications.
Advantages :
Improved user experience : Faster responses reduce wait times.
Dynamic load balancing : Adjusts distribution based on real‑time performance.
Handles traffic spikes : Quickly processes bursty traffic.
Disadvantages :
Computational overhead : Continuous monitoring adds load.
Susceptible to transient spikes : Temporary latency spikes may misguide routing.
Ignores other metrics : May overlook CPU, memory, or other performance factors.
The article also mentions other strategies such as DNS‑based load balancing for global distribution and data‑layer balancing using sharding hashes.
Choosing the right load‑balancing algorithm requires evaluating application requirements, server capabilities, and network conditions to achieve optimal distribution.
Architecture & Thinking
🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.