Operations 11 min read

Designing High‑Availability Stateless Services: Redundancy, Load Balancing, Scaling, and Monitoring

The article explains how to build highly available stateless services by using redundant deployment, vertical and horizontal scaling, appropriate load‑balancing algorithms, monitoring, and automated recovery, and also discusses high‑concurrency identification, CDN/OSS usage, and practical recommendations for cloud‑native environments.

Architecture Digest
Architecture Digest
Architecture Digest
Designing High‑Availability Stateless Services: Redundancy, Load Balancing, Scaling, and Monitoring

A lighthearted discussion on architecture design points out that incidents are the result of accumulated load; as user traffic grows, systems without high‑availability planning will inevitably fail, making high availability a substantial discipline.

When designing a highly available system, one must consider solution selection pitfalls, emergency fault‑handling plans, monitoring to detect failures, automated recovery mechanisms, code performance and error handling, as well as techniques such as service degradation, rate limiting, and circuit breaking.

This article focuses on stateless services at the architectural level and how to guarantee their high availability.

Stateless service: a service that never stores persistent data (except cache), can be destroyed and recreated at any time without losing user data, and can switch to any replica without affecting users; high availability means no data loss, no service outage, minimal impact when some instances fail, and rapid recovery.

Key considerations include:

Redundant deployment

Vertical scaling

Horizontal scaling

Redundant deployment : Deploy at least one extra node to avoid a single point of failure.

Vertical scaling : Increase the performance of a single machine (CPU, memory, SSD, etc.).

Horizontal scaling : Quickly add capacity when traffic spikes.

In a single‑point architecture, as data volume grows the load on a single node can cause crashes; deploying multiple nodes for a stateless service distributes the pressure. Load balancing can be used to fully utilize server resources.

Stateless service definition reiterated: no data storage, no loss on node restart, easy replica switching.

Load‑balancing algorithms:

Random algorithm – distributes requests randomly; balances better with large traffic.

Round‑robin algorithm – cycles through backend servers.

Weighted round‑robin – assigns higher weight to servers with greater capacity.

Weighted random – similar to weighted round‑robin but selects randomly based on weight.

Least connections – chooses the server with the fewest active connections.

Source‑address hash – hashes the client IP so the same client consistently reaches the same server (session‑persistence).

Algorithm selection guidance: discard random; start with basic round‑robin for uniformly configured servers (e.g., VMs). If multiple applications share a server, consider weighted round‑robin or least‑connections. For containerized workloads, default round‑robin with cookie‑based session persistence is common.

High‑concurrency identification uses QPS (queries per second). Example calculations:

Formula: (100000 * 80%) / (86400 * 20%) = 4.62 QPS (peak QPS)

The principle: 80% of daily visits concentrate in 20% of the time (the peak period).

((60*24)*50000) / 86400 = 833 QPS

Generally, a few hundred QPS is considered high concurrency; major sites may see 1,500–5,000 QPS peaks. Besides QPS, response time and concurrent user count are also important metrics.

When server load is high, symptoms include slower processing, network disconnects, request failures, and error messages; monitoring can reveal performance status, enabling dynamic adjustments, retries, and ensuring service availability. Vertical scaling is the quickest way to boost single‑machine performance, but it has limits and a single‑machine failure can be catastrophic, so achieving “five‑nines” reliability is essential.

Horizontal auto‑scaling : After recognizing the limits of vertical scaling, add new nodes as load increases. Manual schedulers can detect system state and trigger scaling via IaaS APIs; cloud providers also offer elastic scaling services. In container environments, configure autoscaling and scheduling policies to prevent single‑node failures.

Note: IaaS (Infrastructure‑as‑a‑Service) manages servers, storage, and networking; elastic scaling targets stateless services. Stateless services should not be mixed with stateful workloads on the same scaling group, as database pressure may arise.

CDN and OSS : Front‑end static assets (images, videos, HTML/CSS/JS) affect page load speed. Using a CDN caches these assets at edge servers, reducing latency. CDN can also handle HTTPS certificates, configure origin timeouts, follow redirects, compress pages, and customize error pages. OSS (Object Storage Service) provides virtually unlimited storage for media and cold data, often used together with CDN for efficient delivery.

Overall, the article provides a practical checklist for building highly available, stateless services, covering redundancy, scaling strategies, load‑balancing algorithm choices, monitoring, and static‑asset delivery.

MonitoringHigh AvailabilityLoad BalancingScalinghorizontal scalingvertical scalingstateless services
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.