Operations 10 min read

Designing High‑Availability Stateless Services: Load Balancing, Scaling, and CDN Strategies

This article explains how to build highly available stateless services by using redundant deployment, vertical and horizontal scaling, various load‑balancing algorithms, high‑concurrency metrics, and CDN/OSS techniques to ensure reliability and performance in modern architectures.

Architecture Digest

May 22, 2021

Designing High‑Availability Stateless Services: Load Balancing, Scaling, and CDN Strategies

Accidents in software systems are the result of accumulated load; without high‑availability (HA) design, increasing user traffic inevitably leads to failures, so HA must be considered from the start.

Designing an HA system involves fault‑tolerant solutions, monitoring, automated recovery, code‑level performance optimization, and mechanisms such as service degradation, rate limiting, and circuit breaking.

Stateless services do not store data (except cache), can be created or destroyed at any time, and guarantee no data loss and minimal impact when a node fails, enabling rapid recovery.

Key HA considerations include redundant deployment, vertical scaling, and horizontal scaling.

Redundant deployment: Deploy at least two nodes to avoid single‑point failures and distribute load, often using load‑balancing to schedule requests.

Vertical scaling: Increase a single machine’s resources (CPU, memory, storage, network) to handle more load.

Horizontal scaling: Add more nodes to handle traffic spikes, with automatic scaling to respond to demand.

Load‑balancing algorithms for stateless services include:

Random algorithm – selects a backend server randomly.

Round‑robin algorithm – cycles through servers sequentially.

Weighted round‑robin – assigns higher weight to more capable servers.

Weighted random – random selection based on weights.

Least‑connections – directs traffic to the server with the fewest active connections.

Source‑IP hash – hashes the client IP to consistently route a client to the same server.

Choosing an algorithm: start with simple round‑robin for uniform servers; use weighted round‑robin or least‑connections when servers differ in capacity or when long‑lived connections (e.g., FTP) are present; apply source‑IP hash for session‑affinity needs.

High‑concurrency identification uses QPS (queries per second). Example calculations:

Peak QPS = (100000 * 80%) / (86400 * 20%) = 4.62 QPS

Another example: QPS = ((60*24)*50000) / 86400 = 833 QPS Generally, hundreds of QPS indicate high concurrency; larger systems may reach thousands of QPS.

Vertical scaling methods:

Server upgrade (CPU, memory, SSD, etc.).

Hardware improvements (SSD, system tuning).

Architectural changes (asynchronous processing, caching, lock‑free structures).

Horizontal auto‑scaling adds nodes when load increases, often using cloud provider elastic scaling or custom schedulers; it is essential for stateless services to avoid single‑point failures.

CDN and OSS considerations: static assets (images, videos, HTML/CSS/JS) should be cached via a CDN to reduce latency, and large media can be stored in object storage (OSS) with optional archival for cold data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CDN stateless service

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.