Cloud Native 10 min read

Designing High‑Availability Stateless Services: Load Balancing, Scaling, and Deployment Strategies

This article explains how to achieve high availability for stateless services by employing redundancy, vertical and horizontal scaling, various load‑balancing algorithms (random, round‑robin, weighted, least‑connections, source‑hash), and automatic scaling techniques in cloud‑native environments, while also covering performance monitoring and CDN/OSS usage.

IT Architects Alliance

Jun 10, 2021

Designing High‑Availability Stateless Services: Load Balancing, Scaling, and Deployment Strategies

Accidents in software systems are the result of accumulated load; as user numbers grow, ignoring high‑availability design inevitably leads to failures, so high‑availability must be considered from the start.

Designing a high‑availability system involves selecting solutions, preparing emergency response plans, implementing monitoring, automating recovery, optimizing code performance and error handling, and applying techniques such as service degradation, rate limiting, and circuit breaking.

A stateless service stores no data (except cache), can be created or destroyed at any time, and ensures no data loss; high availability means the service never fails and can quickly recover when a node crashes.

Key considerations include:

Redundant deployment: deploy at least two nodes to avoid single points of failure.

Vertical scaling: increase single‑machine resources (CPU, memory, storage).

Horizontal scaling: add more nodes to handle traffic spikes.

Load‑balancing algorithms for stateless services:

Random algorithm: selects a backend server randomly; balances only with large traffic volumes.

Round‑robin algorithm: cycles through backends in order.

Weighted round‑robin: assigns higher weight to more capable servers.

Weighted random: similar to weighted round‑robin but selection is random based on weight.

Weighted least‑connections: chooses the server with the fewest active connections, the most intelligent method.

Source‑address hash: hashes the client IP so the same client consistently reaches the same server, useful for session‑preserving scenarios.

Algorithm selection guidance:

Prefer round‑robin for simple, uniform server configurations.

Use weighted round‑robin or least‑connections when servers differ in capacity or host multiple applications.

For short‑lived connections (e.g., HTTP), use weighted round‑robin with cookie‑based session affinity in Kubernetes.

For long‑lived connections (e.g., FTP, sockets), choose weighted least‑connections.

Identifying high‑concurrency applications:

Key metric is QPS (queries per second). Example calculation: (100000 * 80%) / (86400 * 20%) = 4.62 QPS (peak) Another example for 50,000 machines each generating one PV per minute: ((60*24)*50000)/(86400) = 833 QPS Generally, a few hundred QPS qualifies as high concurrency; large services may reach thousands of QPS.

Vertical scaling methods:

Upgrade server hardware (CPU, memory, SSD, network).

Improve hardware performance (SSD, OS tuning).

Architectural adjustments (asynchronous processing, caching, lock‑free structures).

Horizontal auto‑scaling:

When load increases, add new nodes; manual scaling is insufficient for sudden traffic spikes, so automatic scaling is needed. Options include custom schedulers on private clouds, cloud provider elastic scaling services, or Kubernetes auto‑scaler combined with sufficient node capacity.

Note: auto‑scaling targets stateless services; stateful services should remain on a limited number of nodes.

CDN and OSS considerations:

Static assets (images, videos, HTML/CSS/JS) should be cached via a CDN to reduce latency and offload origin servers. OSS (object storage) can store unlimited files, serving as a backend for media and cold data, often combined with CDN for efficient delivery.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native High Availability load balancing Auto Scaling horizontal scaling Vertical Scaling stateless services

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.