Operations 14 min read

Mastering Load Balancing: Algorithms, Code Samples, and Real‑World Insights

This article explains the concept of load balancing in distributed systems, outlines its benefits for throughput and reliability, compares common architectural layers, evaluates key algorithmic considerations, and provides Python implementations of round‑robin, weighted, random, hash‑based, and least‑connection strategies along with deployment options.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Mastering Load Balancing: Algorithms, Code Samples, and Real‑World Insights

Understanding Load Balancing

Load balancing distributes requests evenly across a group of homogeneous servers or processes, enabling a single Internet service to be provided by multiple backend nodes (server farm, server pool). It improves system throughput, reduces response time, and enhances reliability by preventing overload and single‑point failures.

Typical Distributed Architecture

A typical web architecture consists of client, reverse‑proxy (e.g., Nginx), site, service, and data layers. Each downstream layer may have multiple upstream instances, and the goal is to ensure every upstream accesses each downstream uniformly.

Client → Reverse‑proxy: DNS round‑robin

Reverse‑proxy → Site: Nginx

Site → Service: connection pool

Data layer: range‑based or hash‑based partitioning

Algorithm Evaluation Criteria

When choosing a load‑balancing algorithm, consider:

Differences in node capacities (CPU, memory, network, location)

Dynamic changes in node performance

Stateful services that require the same client to hit the same node

Who acts as the balancer and whether it can become a bottleneck

Load‑Balancing Algorithms

Round‑Robin

SERVER_LIST = ['10.246.10.1', '10.246.10.2', '10.246.10.3']

def round_robin(server_lst, cur=[0]):
    length = len(server_lst)
    ret = server_lst[cur[0] % length]
    cur[0] = (cur[0] + 1) % length
    return ret

This simple method gives each node an equal chance, ignoring capacity differences.

Weighted Round‑Robin

WEIGHT_SERVER_LIST = {'10.246.10.1': 1, '10.246.10.2': 3, '10.246.10.3': 2}

def weight_round_robin(servers, cur=[0]):
    weighted_list = []
    for k, v in servers.items():
        weighted_list.extend([k] * v)
    length = len(weighted_list)
    ret = weighted_list[cur[0] % length]
    cur[0] = (cur[0] + 1) % length
    return ret

Assigns more requests to higher‑capacity nodes.

Random Selection

import random

def random_choose(server_lst):
    random.seed()
    return random.choice(server_lst)

Weighted Random

def weight_random_choose(servers):
    weighted_list = []
    for k, v in servers.items():
        weighted_list.extend([k] * v)
    return random.choice(weighted_list)

Hash‑Based Selection

def hash_choose(request_info, server_lst):
    hashed = hash(request_info)
    return server_lst[hashed % len(server_lst)]

Maps a request (e.g., client IP) to a specific node, useful for stateful services.

Consistent Hashing

Improves hash‑based selection by mapping physical nodes to multiple virtual nodes, reducing remapping when nodes are added or removed.

Least Connection

Chooses the node with the fewest active connections, dynamically adapting to real‑time load.

Stateful Request Handling

For services that keep session state, ensure the same client is routed to the same backend (using consistent hashing or range partitioning) or share state via a common datastore (e.g., Redis, Memcached) or client‑side storage such as cookies.

Where to Place the Load Balancer

Two main approaches:

Client‑side balancing : Clients receive a server list and select a node locally, suitable for simple algorithms.

Proxy‑side balancing : A dedicated load‑balancer (e.g., Nginx, F5, LVS) sits before the server pool, handling complex algorithms and providing a single entry point.

Example with gRPC: the balancer queries server load, and the client connects directly to the chosen server.

Proxy‑based solutions (e.g., Nginx at layer 7, LVS at layer 4) centralize control but can become bottlenecks; high‑availability designs use active‑passive pairs.

Push vs. Pull Models

Traditional load balancing is a push model (the balancer pushes requests to a node). A pull model uses a message queue where idle workers pull tasks, achieving natural load distribution but adding latency.

References

Wiki: Load balancing; "一分钟了解负载均衡的一切"; gRPC load‑balancing documentation; Jobbole article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsalgorithmPythonload balancingNetworking
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.