Backend Development 12 min read

Scalable System Design: Load Balancing, Caching, and Platform Strategies

Drawing on Will Larson’s insights, this article outlines essential principles for building horizontally scalable systems—including linear capacity growth, redundancy, load‑balancing techniques, caching strategies, CDN usage, offline processing with message queues, and the benefits of a dedicated platform layer for robust, maintainable backend architectures.

21CTO

Aug 25, 2015

Scalable System Design: Load Balancing, Caching, and Platform Strategies

Recently I read Will Larson’s “Introduction to Architecting Systems for Scale” and found it valuable. The article shares his experience designing scalable systems at Yahoo! and Digg, which inspired this translation and commentary.

Larson argues that an ideal system’s capacity should increase linearly with the number of added servers—doubling the hardware should double capacity, a concept known as horizontal scalability.

Reliability is also crucial: a system should continue operating when a server fails, though capacity will decrease proportionally, a property called redundancy.

Load Balancing

Both horizontal scalability and redundancy can be achieved through load balancing, which distributes incoming requests across servers based on their current load, sitting between clients and web servers.

Larson discusses several load‑balancing approaches. The “Smart Client” embeds balancing logic in database or cache clients, but this mixes coordination code with business logic, making it complex and less reusable.

Is it attractive because it is the simplest solution? Usually, no. Is it seductive because it is the most robust? Sadly, no. Is it alluring because it’ll be easy to reuse? Tragically, no.

Hardware load balancers (e.g., Citrix NetScaler) are costly and typically used by large enterprises.

A hybrid approach uses software load balancers such as HAProxy, which run locally and balance services without additional hardware.

Cache

To reduce server load, caching is essential. Caches can be classified as pre‑computed results, pre‑generated expensive indexes, or fast‑backend copies of frequently accessed data (e.g., Memcached).

Application Cache

Application‑level caching integrates cache handling directly into code, similar to the proxy pattern: check the cache first, fall back to the database if missing, then store the result.

key = "user.%s" % user_id
user_blob = memcache.get(key)
if user_blob is None:
    user = mysql.query("SELECT * FROM users WHERE user_id=\"%s\"", user_id)
    if user:
        memcache.set(key, json.dumps(user))
    return user
else:
    return json.loads(user_blob)

Database Cache

Database‑level caching improves performance without polluting application code; DBAs can tune caches (e.g., Cassandra row cache) without code changes.

In‑Memory Cache

In‑memory caches like Memcached or Redis store data for fast access but must be sized carefully because RAM is expensive and non‑persistent; LRU (least recently used) eviction is common.

CDN

Content Delivery Networks offload static media from web servers, reducing load and improving response times via geographic distribution. Small sites may defer CDN adoption and instead serve static assets via a dedicated subdomain.

Cache Invalidation

Maintaining consistency between cache and source data requires invalidation strategies such as write‑through caches or explicit deletion, possibly with TTLs or application‑level logic to avoid stale data.

Off‑Line Processing

Introducing message queues enables asynchronous (off‑line) processing, relieving web servers. Producers publish tasks; consumers handle them, with completion tracked via polling or callbacks.

Explicitly marking tasks as on‑line or off‑line in API definitions improves readability.

Map‑Reduce

For large‑scale data processing, a Map‑Reduce layer offers better scalability than relying solely on SQL databases, and can be combined with scheduled jobs.

Platform Layer

Larson suggests adding a platform layer between web applications and databases to improve scalability and reusability.

In this layered architecture, each tier can be scaled independently: API servers can be added without touching web servers, and hardware choices differ per tier (e.g., SSDs for DB servers, multi‑core CPUs for web servers).

Extracting cross‑cutting concerns such as caching and database access into the platform layer creates reusable infrastructure, especially beneficial for product‑line systems.

A well‑designed platform layer also enables parallel development across teams by exposing stable interfaces and allowing dedicated platform teams to focus on optimization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

load balancing Caching

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.