Backend Development 12 min read

How to Build Horizontally Scalable Systems: Load Balancing, Caching, and More

The article distills Will Larson’s insights on designing horizontally scalable systems, covering linear capacity growth, redundancy, load‑balancing strategies, caching layers, CDN usage, offline processing with message queues, Map‑Reduce for big data, and the benefits of introducing a platform layer to improve robustness and reusability.

21CTO

Aug 8, 2015

How to Build Horizontally Scalable Systems: Load Balancing, Caching, and More

Recently I read Will Larson’s article "Introduction to Architecting Systems for Scale" and found it highly valuable. The author shares his experience designing scalable architectures at Yahoo! and Digg. Inspired by his insights, I have translated the main points and added my own understanding.

Horizontal Scalability

Larson argues that an ideal system’s capacity should increase linearly with the number of added servers. If a system has one server, adding an identical server should double its capacity, and so on. This linear growth is known as Horizontal Scalability .

Redundancy

A robust system must also handle failures. Larson defines an ideal system as one that does not crash when a server is lost, though capacity will decrease linearly. This property is called Redundancy .

Load Balancing

Both horizontal scalability and redundancy are achieved through load balancing, which acts as a coordinator that distributes incoming requests to web servers based on each server’s current load. The load balancer sits between clients and web servers.

Larson describes three approaches:

Smart Client : embed load‑balancing logic in the client (e.g., database or cache client). This makes the solution complex, less robust, and hard to reuse.

Hardware Load Balancer : use dedicated devices such as Citrix NetScaler, which are expensive and typically adopted by large enterprises.

Software Load Balancer (Hybrid): run a software balancer like HAProxy on a local server to coordinate traffic.

Caching

To reduce server load, caching is essential. Common cache categories include pre‑calculating results, pre‑generating expensive indexes, and storing frequently accessed data in a fast backend store (e.g., Memcached).

Application Cache

Application‑level caching requires explicit integration of caching code, similar to the Proxy pattern: check the cache first, fall back to the database if missing, then store the result. Example code using Memcached:

key = "user.%s" % user_id user_blob = memcache.get(key) if user_blob is None: user = mysql.query("SELECT * FROM users WHERE user_id=\"%s\"" % user_id) if user: memcache.set(key, json.dumps(user)) return user else: return json.loads(user_blob)

Database Cache

Database‑level caching leaves application code untouched; DBAs can improve performance through tuning, such as enabling row caching in Cassandra.

In‑Memory Cache

Typical in‑memory caches include Memcached and Redis. While they boost performance, they consume expensive RAM and lack persistence, so only selected data should be cached, often using an LRU (Least Recently Used) eviction policy.

Content Delivery Network (CDN)

Placing static assets in a CDN offloads web servers and improves response times via geographic distribution. When a request arrives, the CDN is queried first; if the asset is missing, the CDN fetches it from the origin server, caches it locally, and serves subsequent requests.

Cache Invalidation

Ensuring consistency between cached and real data is known as Cache Invalidation . Common strategies are write‑through (update cache on write) or delete‑then‑lazy‑load (remove cache entry and repopulate on next read).

Off‑Line Processing

Introducing a message queue enables asynchronous processing of time‑consuming tasks, reducing web‑server load. The web server publishes messages, and consumers process them later. The following diagram illustrates this pattern:

Explicitly marking tasks as On‑Line or Off‑Line in API definitions improves readability.

Map‑Reduce

For large‑scale data processing, a dedicated Map‑Reduce layer offers better scalability than a traditional SQL database. It can be combined with scheduled jobs, as shown below:

Platform Layer

Larson suggests adding a Platform Layer between web applications and databases. This separation allows independent scaling of APIs and web servers, lets each tier use appropriate hardware (e.g., SSDs for DB I/O, multi‑core CPUs for web servers), and promotes reuse of cross‑cutting concerns such as caching and database access.

For product‑line systems, the Platform Layer serves as shared infrastructure, enabling parallel development across teams and potentially a dedicated platform team to maintain and optimize it.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems load balancing Caching

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.