Backend Development 11 min read

Key Lessons on Designing Scalable System Architecture

Will Larson’s insights on scalable system architecture emphasize linear horizontal scalability, redundancy, load balancing strategies, caching layers, CDN usage, offline processing with message queues, and platform layering, providing practical guidance for building robust, high‑capacity backend systems.

Qunar Tech Salon

Mar 27, 2016

Key Lessons on Designing Scalable System Architecture

Recently I read Will Larson’s article “Introduction to Architecting Systems for Scale” and found it highly valuable; it shares his experience designing scalable architectures at Yahoo! and Digg. The article inspired me to extract its main points and combine them with my own understanding.

Larson argues that an ideal system’s capacity should increase linearly with the number of added servers—doubling the hardware should double capacity—known as horizontal scalability.

Robust system design must also consider failure: losing a server should not crash the system, though capacity will decrease proportionally, a principle called redundancy.

Load Balancing

Both horizontal scaling and redundancy are achieved through load balancing, which acts as a coordinator that distributes incoming web requests across servers based on their current load, typically placed between clients and web servers.

One approach is the “Smart Client,” embedding load‑balancing logic in database or cache clients; this method is complex, brittle, and hard to reuse.

Is it attractive because it is the simplest solution? Usually, no. Is it seductive because it is the most robust? Sadly, no. Is it alluring because it’ll be easy to reuse? Tragically, no.

Another option is hardware load balancers (e.g., Citrix NetScaler), which are expensive and usually adopted by large enterprises.

A hybrid solution is a software load balancer such as HAProxy, running locally to balance services without the cost of dedicated hardware.

Caching

To reduce server load, caching is essential. Common cache types include pre‑computed results, pre‑generated expensive indexes, and fast‑backend copies of frequently accessed data (e.g., Memcached).

Application Cache

Application‑level caching requires explicit integration in code, similar to the Proxy pattern: check the cache first, fall back to the database if missing, then store the result. Example using Memcached:

key = "user.%s" % user_id
user_blob = memcache.get(key)
if user_blob is None:
    user = mysql.query("SELECT * FROM users WHERE user_id=\"%s\"", user_id)
    if user:
        memcache.set(key, json.dumps(user))
    return user
else:
    return json.loads(user_blob)

Database Cache

Database caching avoids polluting application code; skilled DBAs can improve performance via configuration (e.g., Cassandra row cache) without code changes.

In‑Memory Cache

In‑memory caches like Memcached and Redis boost performance but must be used judiciously because RAM is costly and volatile; typical eviction policies such as LRU are employed.

CDN

Offloading static assets to a Content Delivery Network reduces web‑server load and improves response times through geographic distribution. When a request arrives, the CDN serves cached content or fetches it from the origin if missing.

Cache Invalidation

Maintaining consistency between cache and source data—cache invalidation—can be handled by write‑through updates or by deleting stale entries and repopulating on the next read.

Off‑Line Processing

Introducing a message queue enables off‑line processing: web servers publish tasks, consumers handle them asynchronously, and completion can be tracked via polling or callbacks. This greatly reduces web‑server pressure.

Scheduled jobs (e.g., Spring Batch) can also utilize idle server time; distributed execution can be managed with tools like Puppet.

Map‑Reduce

For large‑scale data processing, a Map‑Reduce layer offers better scalability than pure SQL databases and can be combined with scheduling mechanisms.

Platform Layer

Separating a platform layer from web applications allows independent scaling of APIs and services. Different layers have distinct hardware requirements (e.g., SSDs for DB servers, many‑core CPUs for web servers).

An additional platform layer improves reusability by centralizing cross‑cutting concerns such as caching and database access, benefiting product‑line systems.

It also facilitates parallel development across teams by exposing stable interfaces while hiding implementation details, potentially managed by a dedicated platform team.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

system architecture Scalability load balancing Caching

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.