Operations 21 min read

Mastering High Availability and High Concurrency: Principles and Practical Techniques

This article outlines guiding principles, high‑availability strategies, and high‑concurrency techniques—covering stateless design, resource isolation, quota management, monitoring, degradation, rollback, and scaling—to help engineers build resilient, scalable systems while balancing cost and performance.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering High Availability and High Concurrency: Principles and Practical Techniques
The structure of this article follows Zhang Kaitao’s book *Core Technologies of Billion‑Scale Websites* and is divided into four parts: guiding principles, high availability, high concurrency, and case studies. This post discusses the first three sections, with most content based on the author’s own reflections.

Guiding Principles

High Availability

High Concurrency

Guiding Principles

High‑concurrency principles:

Stateless design – avoiding state eliminates lock contention and serialization.

Reasonable granularity – service granularity should be controlled to disperse requests and improve manageability.

Cache, queue, and other concurrency tricks are useful but must be applied per scenario.

High‑availability principles:

Every release must be rollback‑capable.

All external dependencies should be measurable for degradability and provide a safe‑degrade switch.

Public interfaces must be properly rate‑limited with reliable limits.

Business‑design principles:

Security – anti‑scraping, anti‑duplicate submissions, etc.

Idempotent design where appropriate.

Dynamic business flows and rules.

Owner responsibility, backup personnel, on‑call rotation.

Comprehensive documentation.

Traceable backend operations.

High Availability

High availability aims to resist uncertainty and keep a system healthy 24/7. Uncertainty can come from data‑center outages, engineer turnover, downstream failures, or hardware faults. Different disaster levels correspond to different availability (N‑nines) and cost trade‑offs.

To handle uncertainty we classify incidents into four stages: pre‑incident, incident, in‑progress, and post‑incident, each with specific techniques.

Pre‑incident

Replication technology – redundant copies, stateless service clusters, reverse proxies, RAID, multi‑region DB replicas, etc.

Isolation technology – resource isolation (threads, processes, clusters, data‑centers, read/write, hot/cold, etc.) to prevent a failure in one resource from affecting others.

Quota technology – limiting resource supply, e.g., rate limiting, to protect the system.

Discovery technology – probing system health (load testing, chaos engineering).

Pre‑plans – proactive measures and emergency playbooks.

Incident

Monitoring and alerting enable rapid fault detection, answering why the issue was discovered, how quickly it can be resolved, and its impact.

In‑progress

Key actions include degradation, rollback, and fail‑XXX strategies.

Degradation

Degradation sacrifices non‑essential functionality to keep the overall system alive. It can be implemented via circuit breaking (temporary bypass) or fallback paths. Decisions on when to degrade and when to recover are made either manually or automatically based on thresholds such as timeout, error count, or traffic volume.

Rollback

When a change causes a fault, rolling back to a known good state is the safest remedy. Rollback requires that changes be designed to be reversible, leveraging DB transactions, version control (Git), or deployment tools.

Fail‑XXX series

fail‑retry – retry with back‑off.

fail‑over – switch to another instance or replica.

fail‑safe – silent handling for weak dependencies.

fail‑fast – immediate error reporting for rapid human intervention.

fail‑back – delayed compensation (e.g., replay via message queue).

Post‑incident

Post‑mortem review, reflection, and technical improvement.

High Concurrency

Achieving high concurrency while maintaining availability involves improving processing speed, adding processing capacity, and reducing incoming traffic.

Queue illustration
Queue illustration

In a checkout line analogy, the cashier is the service and each customer is a request. To serve more customers within acceptable wait times we can:

Increase processing speed – cache and async processing.

Add processing manpower – multithreading (or multi‑process) and scaling.

Reduce traffic – pre‑filtering or pre‑warming (not covered here).

Improve Processing Speed

Cache

Cache boosts speed by storing frequently accessed data closer to the consumer. Consider cache hit rate, eviction policies (space‑based or time‑based), replacement algorithms (LRU, FIFO, LFU), multi‑level caches (in‑memory, off‑heap, disk, distributed), and challenges such as null‑penetration, cache stampede, consistency, and read‑write amplification.

Async

Asynchrony can be achieved by:

Transforming sequential calls into parallel async calls (max‑time execution).

Using async I/O at the OS level.

Offloading work to queues for later processing – e.g., buffer queues, task queues, message queues, request queues, data‑bus queues, priority queues, replica queues, and mirror queues.

Queue consumption can be pull‑based (controlled progress) or push‑based (real‑time). Proper back‑off and retry strategies are essential.

Increase Processing Manpower

Multithreading

Thread (or process) pools are common in web servers, gateways, RPC services, and message consumers. Thread count should be tuned based on average request latency, peak concurrency, blocking rate, acceptable response time, and CPU cores.

Scaling

Stateless horizontal scaling is the most effective way to handle spikes. Scaling can be vertical (scale‑up) or horizontal (scale‑out). Horizontal scaling may require service decomposition. From a data perspective, scaling includes stateless scaling (adding application instances) and stateful scaling (sharding or replicating data). Sharding introduces reliability concerns; replication requires consistency protocols such as Paxos or Raft.

Overall, combining sound guiding principles, robust high‑availability tactics, and effective high‑concurrency techniques enables engineers to build systems that stay available and performant under heavy load while controlling costs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OperationsScalabilitySystem Designhigh concurrencyfault tolerance
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.