Operations 10 min read

Designing Highly Available Stateless Services: Load Balancing and Scaling Strategies

This article explains how to build highly available stateless services by using redundant deployment, vertical and horizontal scaling, various load‑balancing algorithms, and automatic recovery mechanisms, while also covering monitoring, high‑concurrency identification, and the role of CDN and OSS in resilient architecture.

ITFLY8 Architecture Home

Apr 22, 2021

Designing Highly Available Stateless Services: Load Balancing and Scaling Strategies

Laughing about Architecture Design

Failures are the result of accumulated load; without high‑availability planning, any increase in users will eventually cause outages. Designing a highly available system requires considering redundancy, monitoring, automated recovery, code performance, and fault‑tolerant mechanisms such as degradation, rate limiting, and circuit breaking.

Stateless Services and High Availability

A stateless service does not store data (except cache), can be created or destroyed at any time, and allows any request to be routed to any replica without data loss, ensuring minimal impact when a node fails.

Key Design Aspects

Redundant Deployment: Deploy at least two nodes to avoid single points of failure.

Vertical Scaling: Increase the capacity of a single machine.

Horizontal Scaling: Add more nodes to handle traffic spikes quickly.

Load Balancing for Stateless Services

Four basic algorithms can be used:

Random: Requests are sent to a random backend; large traffic approximates balance.

Round‑Robin: Requests are distributed sequentially across backends.

Weighted Round‑Robin: Assign higher weight to more capable servers to reduce overload.

Weighted Random: Similar to weighted round‑robin but selection is random based on weight.

For more intelligent distribution, the Weighted Least Connections algorithm selects the server with the fewest active connections.

If session persistence is required, the Source‑IP Hash algorithm maps the same client IP to the same backend.

Choosing a Load‑Balancing Algorithm

Prefer round‑robin for homogeneous servers; use weighted round‑robin or least‑connections when servers differ or when long‑lived connections (FTP, sockets) are involved.

Identifying High‑Concurrency Scenarios

Key metric is QPS (queries per second). For example, 100,000 daily PV with 80% traffic in 20% of the time yields a peak QPS of about 4.6. A system handling 50,000 machines with one PV per minute results in ~833 QPS. Hundreds of QPS generally qualify as high concurrency.

Formula: (100000 * 80%) / (86400 * 20%) = 4.62 QPS (peak)

Formula: ((60*24)*50000) / 86400 = 833 QPS

Other indicators include response time and concurrent user count. Monitoring helps detect overload, slowdowns, or failures, prompting vertical scaling or other mitigations.

Vertical Scaling

Increase a single server’s resources via CPU, memory, SSD, network upgrades, or architectural changes such as async processing, caching, and lock‑free designs.

Horizontal Auto‑Scaling

When load rises, add new nodes. Manual scaling may suffice for private clouds, but automatic scaling (e.g., Kubernetes HPA) is essential for handling traffic spikes without manual intervention.

Auto‑scaling is most effective for stateless services; stateful services should remain on vertically scaled nodes.

CDN and Object Storage (OSS)

Static assets (images, videos, HTML/CSS/JS) should be cached via a CDN to reduce latency and improve resilience. OSS provides virtually unlimited object storage for media and cold data, often used together with CDN for efficient delivery.

Using CDN with HTTPS certificates, origin timeout settings, and intelligent compression further enhances user experience and fault tolerance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

High Availability load balancing CDN OSS scaling stateless services

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.