Operations 9 min read

How to Build Truly High‑Availability Stateless Services: Strategies & Algorithms

This article explains how to design highly available stateless services by covering redundancy, vertical and horizontal scaling, load‑balancing algorithms, high‑concurrency identification, and the use of CDN/OSS, offering practical guidance for robust backend architecture.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
How to Build Truly High‑Availability Stateless Services: Strategies & Algorithms

Stateless Service High Availability

Accidents are the result of accumulated load; as user numbers grow, ignoring high‑availability inevitably leads to failures. Designing a highly available system requires considering redundancy, monitoring, automated recovery, performance, error handling, and graceful degradation such as rate limiting and circuit breaking.

Redundant Deployment

Deploy multiple nodes to avoid single‑point failures, use vertical scaling to boost single‑machine performance, and horizontal scaling to quickly add capacity during traffic spikes.

Load Balancing Algorithms

Four basic algorithms are available: random, round‑robin, weighted round‑robin, weighted random, least‑connections, and source‑address hash. Weighted round‑robin assigns higher weight to servers with greater capacity, while least‑connections selects the server with the fewest active connections.

Choosing an Algorithm

Start with simple round‑robin for uniformly configured servers; use weighted round‑robin or least‑connections when multiple applications share a server. For short‑connection scenarios (e.g., HTTP), prefer weighted round‑robin with cookie‑based session persistence; for long‑connection services (FTP, sockets) use weighted least‑connections.

Identifying High Concurrency

High concurrency is measured by QPS. Example formulas: peak QPS = (100 000 × 80%)/(86 400 × 20%) ≈ 4.6 QPS; 50 000 machines each handling one request per minute yield 833 QPS. Generally, a few hundred QPS indicates high concurrency.

Vertical Scaling

Increase a single machine’s resources via CPU, memory, SSD, or system tuning, and improve software with async processing, caching, and lock‑free structures. While fast, vertical scaling has limits and creates a single point of failure.

Horizontal Auto‑Scaling

When load rises, add nodes automatically. Implement custom schedulers in private clouds or use cloud provider elastic‑scaling services. For containers, configure auto‑scaling at the IaaS layer or within Kubernetes, ensuring stateless services are the target.

CDN and OSS

Static assets (images, videos, HTML/CSS/JS) should be cached via a CDN to reduce latency. Combine CDN with object storage (OSS) for unlimited media storage and archival of cold data. This improves user experience and offloads backend traffic.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilityload balancingAuto Scalingscalingstateless service
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.