Scalable Backend Architecture: Load Balancing, Caching, and Platform Layers

Will Larson’s insights on building scalable systems are distilled here, covering horizontal scalability, redundancy, load balancing strategies, various caching techniques, CDN usage, offline processing with message queues, Map‑Reduce for big data, and the benefits of introducing a dedicated platform layer for robust backend architecture.

21CTO
21CTO
21CTO
Scalable Backend Architecture: Load Balancing, Caching, and Platform Layers

Recently I read Will Larson’s article “Introduction to Architecting Systems for Scale” and found it highly valuable. He shares his experience designing scalable systems at Yahoo! and Digg. In many enterprise internal applications, scalability is often overlooked because load is modest; typically, clustering and load balancing are considered sufficient. This article extracts the key points and adds my own understanding.

Horizontal Scalability

An ideal system should increase capacity linearly with the number of added servers. Doubling the hardware should double capacity, a principle known as horizontal scalability.

Redundancy

A robust system must tolerate the loss of a server without crashing, though capacity will decrease proportionally. This is called redundancy.

Load Balancing

Both horizontal scalability and redundancy are achieved through load balancing, which distributes incoming requests across servers based on current load. The load balancer sits between clients and web servers.

Load‑balancing methods include:

Smart Client : embed balancing logic in the client (e.g., database or cache client). This approach is complex, less robust, and hard to reuse.

Hardware Load Balancer : such as Citrix NetScaler, suitable for large enterprises due to high cost.

Software Load Balancer (Hybrid): e.g., HAProxy running locally to balance services.

Cache

Caching reduces server load. Common cache categories are pre‑computed results, pre‑generated expensive indexes, and in‑memory copies of frequently accessed data (e.g., Memcached).

Application Cache

Application‑level caching integrates cache handling code directly into the application, similar to the proxy pattern. Below is a Python example using Memcached:

key = "user.%s" % user_id
user_blob = memcache.get(key)
if user_blob is None:
    user = mysql.query("SELECT * FROM users WHERE user_id=\"%s\"" , user_id)
    if user:
        memcache.set(key, json.dumps(user))
        return user
else:
    return json.loads(user_blob)

Database Cache

Database caching avoids code changes; DBAs can improve performance via tuning, e.g., enabling Cassandra row cache.

In‑Memory Cache

Typical in‑memory caches include Memcached and Redis. Storing everything in RAM is costly and reduces robustness because memory data is volatile. Use algorithms like LRU to decide what to cache.

CDN

Offloading static assets to a CDN reduces web‑server load and improves response times via geographic distribution.

Off‑Line Processing

Introducing a message queue enables asynchronous processing. Web servers publish messages; consumers handle them later, reducing request latency. Tasks can be marked as On‑Line or Off‑Line in public interfaces.

Message queues also free web servers from long‑running jobs. Scheduled tasks (e.g., Spring Batch) can utilize idle server time, and tools like Puppet can manage multi‑machine task execution.

Map‑Reduce

For big‑data workloads, a dedicated Map‑Reduce layer offers better scalability than a pure SQL database. It can be combined with scheduling mechanisms.

Platform Layer

Separating a platform layer from web applications allows independent scaling. Adding new APIs can be done by provisioning platform servers without touching web servers. Different layers have distinct hardware requirements: databases need high I/O (SSD), web servers need powerful CPUs.

The platform layer also centralizes cross‑cutting concerns such as caching and database access, improving reusability across product lines and enabling parallel development by dedicated platform teams.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend Architectureload balancingcachingMessage Queue
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.