Designing High‑Availability Stateless Services: Load Balancing, Scaling, and CDN Strategies
This article explains how to build highly available stateless services by using redundant deployment, vertical and horizontal scaling, various load‑balancing algorithms, high‑concurrency metrics, and CDN/OSS techniques to ensure reliability and performance in modern architectures.
Accidents in software systems are the result of accumulated load; without high‑availability (HA) design, increasing user traffic inevitably leads to failures, so HA must be considered from the start.
Designing an HA system involves fault‑tolerant solutions, monitoring, automated recovery, code‑level performance optimization, and mechanisms such as service degradation, rate limiting, and circuit breaking.
Stateless services do not store data (except cache), can be created or destroyed at any time, and guarantee no data loss and minimal impact when a node fails, enabling rapid recovery.
Key HA considerations include redundant deployment, vertical scaling, and horizontal scaling.
Redundant deployment: Deploy at least two nodes to avoid single‑point failures and distribute load, often using load‑balancing to schedule requests.
Vertical scaling: Increase a single machine’s resources (CPU, memory, storage, network) to handle more load.
Horizontal scaling: Add more nodes to handle traffic spikes, with automatic scaling to respond to demand.
Load‑balancing algorithms for stateless services include:
Random algorithm – selects a backend server randomly.
Round‑robin algorithm – cycles through servers sequentially.
Weighted round‑robin – assigns higher weight to more capable servers.
Weighted random – random selection based on weights.
Least‑connections – directs traffic to the server with the fewest active connections.
Source‑IP hash – hashes the client IP to consistently route a client to the same server.
Choosing an algorithm: start with simple round‑robin for uniform servers; use weighted round‑robin or least‑connections when servers differ in capacity or when long‑lived connections (e.g., FTP) are present; apply source‑IP hash for session‑affinity needs.
High‑concurrency identification uses QPS (queries per second). Example calculations:
Peak QPS = (100000 * 80%) / (86400 * 20%) = 4.62 QPSAnother example:
QPS = ((60*24)*50000) / 86400 = 833 QPSGenerally, hundreds of QPS indicate high concurrency; larger systems may reach thousands of QPS.
Vertical scaling methods:
Server upgrade (CPU, memory, SSD, etc.).
Hardware improvements (SSD, system tuning).
Architectural changes (asynchronous processing, caching, lock‑free structures).
Horizontal auto‑scaling adds nodes when load increases, often using cloud provider elastic scaling or custom schedulers; it is essential for stateless services to avoid single‑point failures.
CDN and OSS considerations: static assets (images, videos, HTML/CSS/JS) should be cached via a CDN to reduce latency, and large media can be stored in object storage (OSS) with optional archival for cold data.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.