Operations 13 min read

Designing Highly Available Stateless Services: Key Strategies & Best Practices

This article explains how to build highly available stateless services by covering redundant deployment, load‑balancing algorithms, vertical and horizontal scaling, CDN/OSS usage, and practical recommendations for monitoring, automated recovery, and minimizing failure impact.

MaGe Linux Operations

Apr 4, 2021

Designing Highly Available Stateless Services: Key Strategies & Best Practices

A Lighthearted Take on Architecture Design

Accidents are the result of accumulated load; as user numbers increase, ignoring high‑availability design inevitably leads to failures, and high availability is a vast discipline.

Do you want to know what to consider when designing a highly available system?

Consider the pitfalls of solution selection and plan emergency failure handling.

Implement monitoring to detect failures promptly.

Provide automated recovery and pre‑alert processing.

Address code‑level concerns such as performance and error handling.

Minimize impact through service degradation, rate limiting, and circuit breaking.

…

This article introduces how stateless services achieve high availability at the architectural level.

Stateless service: at any time the service does not store data (except cache), can be destroyed or created arbitrarily, user data is never lost, and traffic can be switched to any replica without affecting users.

High availability for stateless services aims to prevent data loss and service outages, ensuring minimal impact when a component fails and enabling rapid recovery.

Key considerations include:

Redundant deployment: deploy at least one extra node to avoid single points of failure.

Vertical scaling: increase single‑machine performance.

Horizontal scaling: quickly add capacity during traffic spikes.

Redundant Deployment

In a single‑point architecture, growing data volume overloads the node, leading to crashes. Deploying multiple stateless nodes distributes load.

Load balancers can be used to route incoming requests efficiently.

Stateless service: no data storage; restarting a node does not cause data loss.

Load balancing: distributes requests across multiple nodes.

Load Balancing for Stateless Services

Four basic algorithms are available:

Random algorithm: selects a backend randomly; with large traffic it approximates balance.

Round‑robin algorithm: cycles through backends sequentially.

The above two may cause imbalance when backend capacities differ, leading to the introduction of weighted and connection‑based algorithms.

Weighted round‑robin: assign higher weight to more capable servers, reducing crash risk.

Weighted random: similar to weighted round‑robin but selects based on weight randomly.

Weighted least‑connections: chooses the server with the fewest active connections, the most intelligent option.

When session persistence is required, use source‑address hash to keep a client’s requests on the same server.

Source‑address hash: hashes the client IP so that the same client always reaches the same backend.

Choosing a Load‑Balancing Algorithm

Start with simple round‑robin for uniformly configured servers; for environments with multiple applications per server, consider weighted round‑robin or least‑connections.

Weighted round‑robin suits short‑connection scenarios (e.g., HTTP services in Kubernetes), while least‑connections fits long‑connection services such as FTP.

If cookie‑based session persistence is needed, the source‑address hash algorithm can be used.

Identifying High‑Concurrency Applications

Key metric is QPS (queries per second). Example calculation:

公式 (100000 * 80%) / (86400*20%) = 4.62 QPS（峰值QPS）

The principle: 80% of traffic occurs in 20% of the time (the peak period).

Another example with 50,000 machines each generating one PV per minute: ((60*24)*50000)/(86400)=833 QPS Hundreds of QPS generally qualify as high concurrency; major platforms may see 1,500‑5,000 QPS peaks.

Other indicators include response time and concurrent user count.

When server load is high, symptoms include slower processing, network drops, request failures, and error messages; detailed analysis is required.

Monitoring can reveal performance status, enabling dynamic adjustments, retries, and ensuring service availability. Vertical scaling is the quickest way to boost a single machine.

Vertical Scaling

Increasing a server’s capacity can be done by:

Upgrading CPU, memory, swap, disk, or network interfaces.

Improving hardware performance (e.g., SSDs, tuning system parameters).

Architectural tweaks such as asynchronous processing, caching, or lock‑free designs.

While vertical scaling is fast, it has limits and creates a single point of failure; achieving “five nines” reliability requires redundancy.

Horizontal Auto‑Scaling

After recognizing single‑machine limits, horizontal scaling adds new nodes to share load.

Automatic scaling is essential for handling traffic spikes without manual intervention.

In private clouds, a custom scheduler can monitor system state and trigger scaling via IaaS APIs.

Public cloud providers offer elastic scaling services.

For containers, combine IaaS‑level scaling with Kubernetes auto‑scaler to prevent single‑node failures.

Term explanation: IaaS (Infrastructure as a Service) manages servers, storage, networking, and other hardware resources.

Note: Elastic scaling targets stateless services.

Stateless services typically require a scaling threshold of thousands of QPS; databases also feel pressure, so avoid deploying stateful services on auto‑scaled nodes.

CDN and OSS

Web front‑ends serve many static assets (images, videos, HTML/CSS/JS). During a web service outage, these assets should still be delivered.

Using a CDN caches static data at edge nodes, reducing latency.

Term explanation: Edge server (edge node) is a server close to users, reducing network transmission time.

CDN can bind HTTPS certificates, configure origin timeout, follow redirects, enable smart compression, and customize error pages.

OSS (Object Storage Service) stores unlimited files as objects; combining OSS with CDN offloads media and cold data.

Many video platforms archive old data to OSS.

Summary

The article covered common high‑availability designs for stateless services:

Redundant deployment.

Six load‑balancing algorithms and how to choose them.

Benefits and drawbacks of vertical scaling.

Horizontal scaling and automatic scaling.

When to use CDN and OSS.

Stateless applications should not store sessions or persistent data.

Further research is needed on implementation details of each algorithm and on high‑availability patterns for stateful services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native High Availability scaling stateless services

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.