Operations 21 min read

Mastering High Availability and Concurrency: Core Principles and Practical Techniques

This article distills essential guiding principles, high‑availability strategies, and high‑concurrency techniques for building resilient, scalable systems, covering stateless design, fault‑handling phases, replication, isolation, rate limiting, caching, async processing, multithreading, and scaling approaches.

21CTO
21CTO
21CTO
Mastering High Availability and Concurrency: Core Principles and Practical Techniques

Guiding Principles

The article is inspired by Zhang Kaitao’s book Core Technologies of Billion‑Scale Traffic Websites and is divided into three parts: guiding principles, high availability, and high concurrency, largely reflecting the author’s own thoughts.

High‑Concurrency Principles

Stateless design – avoiding state to prevent lock contention.

Reasonable granularity – controlling service granularity to disperse requests and improve manageability.

Cache, queue, and concurrency techniques – to be used as appropriate for the scenario.

High‑Availability Principles

Every deployment must be rollback‑capable.

External dependencies must be measurable for graceful degradation and provide a degradation switch.

Public interfaces must be rate‑limited with accurate limits.

Business Design Principles

Security – anti‑scraping, anti‑duplicate submissions, etc.

Idempotent design where appropriate.

Dynamic business processes and rules.

Ownership, backup personnel, on‑call rotation.

Comprehensive documentation.

Traceable backend operations.

These principles represent only a fraction of the vast design space; practitioners should accumulate experience over time.

High Availability

High availability means resisting uncertainty to guarantee 24/7 healthy service. Uncertainties include natural disasters, staff turnover, downstream failures, and hardware faults. Availability is often expressed as “N 9s”; higher N incurs higher cost, so balancing cost and required availability is crucial. Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR) are also important.

Fault‑handling can be divided into four phases:

Pre‑incident – replication, isolation, quota, pre‑planning, probing.

During incident – monitoring and alerting.

Mid‑incident – degradation, rollback, emergency plans, fail‑XXX series.

Post‑incident – post‑mortem, reflection, technical improvement.

Pre‑incident Techniques

Replication

Replication is a powerful weapon against uncertainty, used in stateless service clusters, storage systems (e.g., MySQL master‑slave, RAID, distributed NoSQL partitions), and many other high‑availability components.

Isolation

Various forms of isolation (thread, process, cluster, data‑center, read/write, hot/cold) essentially provide resource isolation, protecting each resource from failures in others.

Quota

Quota limits resource consumption to protect the system; rate limiting is a common quota technique, implemented either cluster‑wide or per‑instance.

Probing

Probing (stress testing, drills) assesses current availability but does not improve it directly; it includes full‑link load testing and various disaster‑recovery drills.

During Incident

Monitoring and Alerting

Effective monitoring and alerting answer three key questions: why was the fault not detected earlier, why was it not resolved sooner, and what is the impact?

Mid‑incident Techniques

Degradation

Degradation sacrifices non‑critical functionality to keep the overall system alive, typically via circuit breaking or fallback paths, with decisions driven by thresholds or manual intervention.

Rollback

Rollback restores a previous stable state; it requires that changes be designed to be rollback‑able, leveraging database transactions, version control, or deployment tools.

Fail‑XXX Series

fail‑retry – retry with back‑off.

fail‑over – switch to alternative instances or replicas.

fail‑safe – silent fallback when the downstream is weakly dependent.

fail‑fast – immediate error reporting.

fail‑back – delayed compensation (e.g., replay via message queue).

Retry policies must balance back‑off intervals and retry counts to avoid overwhelming downstream services.

Post‑incident

Post‑mortem analysis, reflection, and technical improvements complete the fault‑handling cycle.

High Concurrency

Beyond high availability, systems must sustain large request volumes without sacrificing reliability. The challenge is to maintain service quality under high load.

An everyday analogy is a checkout line: speed up cashiers, add more cashiers, or reduce the number of customers. Similarly, high concurrency can be addressed by:

Increasing processing speed – caching and asynchronous processing.

Adding processing “hands” – multithreading/multiprocessing and scaling.

Reducing incoming traffic – pre‑processing (out of scope).

Increasing Processing Speed

Cache

Caching improves speed by storing frequently accessed data closer to the consumer. Consider cache hit rate, eviction policies (LRU, FIFO, LFU), placement (in‑process, off‑heap, distributed), and challenges such as null‑penetration, cache stampede, hot‑key, consistency, and read‑write amplification.

Asynchronous Processing

Asynchrony can be achieved by:

Converting multiple synchronous calls into parallel asynchronous calls (reducing total latency to the max of individual latencies).

Using async I/O provided by frameworks.

Offloading work to message‑queue middleware for later processing, which adds throughput, peak‑shaving, eventual consistency, and decoupling.

Typical queue types include buffer queues, task queues, message queues, request queues, data‑bus queues, priority queues, replica queues, and mirror queues.

Adding Processing Hands

Multithreading

Thread (or process) pools are widely used in web servers, gateways, RPC services, and queue consumers. Thread count should be calculated based on average processing time, peak concurrency, blocking rate, acceptable response time, and CPU cores.

Scaling

Scaling can be vertical (scale‑up) or horizontal (scale‑out). Stateless horizontal scaling adds machines; stateful scaling involves sharding or replication, requiring consistency algorithms such as Paxos or Raft.

Source: http://kriszhang.com/high_performance/
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilitySystem Designhigh concurrencyfault tolerance
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.