Fundamentals 11 min read

How to Build Horizontally Scalable Systems: Lessons from Will Larson

This article distills Will Larson’s insights on designing scalable systems, covering linear capacity growth, redundancy, load‑balancing strategies, multi‑layer caching, cache invalidation, offline processing with message queues, Map‑Reduce integration, and the benefits of a dedicated platform layer.

21CTO
21CTO
21CTO
How to Build Horizontally Scalable Systems: Lessons from Will Larson

Recently I read Will Larson’s “Introduction to Architecting Systems for Scale” and found it valuable. Larson shares his experience from Yahoo! and Digg on designing scalable architectures. In my past work on enterprise software, I rarely considered scalability because internal systems have modest load; typically a cluster and load balancer suffice. The article inspired me to extract its key points.

Larson argues that an ideal system’s capacity should increase linearly with the number of added servers—doubling servers should double capacity—known as horizontal scalability.

Robust design must also handle failures. When a server is lost, the system should continue operating, though capacity drops proportionally, a property called redundancy.

Load Balancing

Both horizontal scaling and redundancy rely on load balancing, which distributes incoming requests across servers based on current load. The balancer sits between clients and web servers.

Larson critiques the “Smart Client” approach (embedding load‑balancing logic in database or cache clients) as complex, brittle, and hard to reuse.

Is it attractive because it is the simplest solution? Usually, no. Is it seductive because it is the most robust? Sadly, no. Is it alluring because it’ll be easy to reuse? Tragically, no.

Hardware load balancers (e.g., Citrix NetScaler) are costly and suited for large enterprises.

A hybrid solution is a software load balancer such as HAProxy, running locally to balance services.

Caching

To reduce server load, various caching strategies are introduced:

Pre‑calculating results (e.g., yesterday’s traffic stats)

Pre‑generating expensive indexes (e.g., recommendation data)

Storing frequently accessed data in fast back‑ends (e.g., Memcached)

Application Cache

Application‑level caching integrates cache calls into code, similar to the Proxy pattern: check cache first, fall back to the database, then store the result.

key = "user.%s" % user_id
user_blob = memcache.get(key)
if user_blob is None:
    user = mysql.query("SELECT * FROM users WHERE user_id=\"%s\"", user_id)
    if user:
        memcache.set(key, json.dumps(user))
    return user
else:
    return json.loads(user_blob)

Database Cache

Database caching avoids polluting application code; DBAs can improve performance via tuning, e.g., enabling Cassandra row cache.

In‑Memory Cache

In‑memory caches like Memcached or Redis boost performance but must be used judiciously because RAM is expensive and non‑persistent; LRU eviction is common.

CDN

Offloading static assets to a Content Delivery Network reduces web‑server load and improves latency via geographic distribution.

Cache Invalidation

Maintaining consistency between cache and source data is critical. Strategies include write‑through caches, explicit deletion, or setting expiration times. Application logic should avoid direct DELETE statements without handling cache invalidation.

Off‑Line Processing

Introducing a message queue enables asynchronous handling of long‑running tasks, relieving web servers. Producers publish messages; consumers process them, with completion tracked via polling or callbacks.

Map‑Reduce

For big‑data workloads, a dedicated Map‑Reduce layer offers superior scalability compared to SQL‑centric designs and can be combined with scheduled jobs.

Platform Layer

Separating a platform layer from web applications allows independent scaling of APIs and services, optimizes hardware choices (e.g., SSDs for DB I/O, multi‑core CPUs for web), and promotes reuse of cross‑cutting concerns like caching and database access.

Well‑designed platform interfaces enable parallel development across teams and can be managed by dedicated platform teams.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

load balancingcachingMessage Queuehorizontal scalingmap-reduce
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.