Operations 20 min read

Guiding Principles and Practices for High Availability and High Concurrency in Large‑Scale Systems

The article outlines core guiding principles, high‑availability strategies, and high‑concurrency techniques—such as stateless design, replica and isolation, quota control, monitoring, degradation, rollback, and scaling—to help engineers build resilient, scalable web architectures for massive traffic.

Architecture Digest

Nov 19, 2017

Guiding Principles

The author adapts concepts from Zhang Kaitao’s book Core Technologies of Billion‑Scale Websites , dividing the discussion into guiding principles, high availability, high concurrency, and case studies. The principles are personal reflections, with the book serving as a reference.

High‑Concurrency Principles

1. Stateless design – avoiding state to prevent lock contention. 2. Reasonable granularity – controlling service granularity to disperse requests and improve manageability. 3. Use of cache, queues, and concurrency techniques as needed per scenario.

High‑Availability Principles

1. Every release must be rollback‑capable. 2. External dependencies should be measurable for graceful degradation and have toggles. 3. All exposed interfaces must have accurate rate‑limiting.

Business Design Principles

1. Security – anti‑scraping, anti‑duplicate submissions, etc. 2. Idempotent consumption where possible. 3. Dynamic business processes and rules. 4. Owner‑responsibility, backup personnel, on‑call rotation. 5. Documentation and traceability of backend operations.

High Availability

High availability means resisting uncertainty to provide 24/7 healthy service. Uncertainties include data‑center outages, personnel turnover, downstream failures, and hardware faults. Availability is often expressed as “Nines”; higher Nines increase cost, so balancing required availability with cost is essential.

Four Phases of Incident Handling

1. Pre‑incident: replicas, isolation, quotas, pre‑plans, probing. 2. Incident detection: monitoring and alerting. 3. Incident response: degradation, rollback, emergency plans, fail‑XXX strategies. 4. Post‑incident: retrospection, improvement.

Pre‑incident Techniques

Replica Technology – Redundant copies (e.g., stateless service clusters, reverse proxies, database replicas, RAID, multi‑region setups) improve resilience. Isolation Technology – Resource isolation (threads, processes, clusters, data‑centers, read/write, hot/cold) protects independent resources from cascading failures. Quota Technology – Rate limiting and quota enforcement protect systems from overload; setting appropriate limits requires full‑stack testing and business forecasting. Probing Technology – Load testing and disaster‑recovery drills assess current capacity but do not directly improve it.

Incident Detection

Effective monitoring and alerting answer three key questions: why was the fault not seen earlier, why was it not resolved quickly, and what was the impact?

Incident Response

Degradation – Sacrificing non‑critical functionality (circuit breaking, fallback paths) to keep the overall system alive; decisions can be manual or automated based on thresholds. Rollback – Reverting changes when they cause failures; requires that changes be designed as rollback‑able (DB transactions, Git version control, deployment tools). Fail‑XXX Series – Strategies include fail‑retry (with back‑off), fail‑over (routing to alternate instances), fail‑safe (silent fallback), fail‑fast (immediate error), and fail‑back (delayed compensation).

Post‑incident

Conduct post‑mortems, reflect on lessons learned, and implement technical improvements.

High Concurrency

Beyond high availability, handling massive request volumes requires improving processing speed, increasing processing capacity, and reducing request numbers.

Improving Processing Speed

Cache

Caching reduces latency by storing frequently accessed data at various layers (CPU, application, distributed caches). Consider hit rate, eviction policies (LRU, FIFO, LFU), cache patterns (read‑through, write‑through), multi‑level caches, and issues such as cache penetration, thundering herd, and consistency.

Asynchronous Processing

Asynchrony can be achieved by converting synchronous calls to parallel execution (async orchestration), using async I/O, or offloading work to queues. Queue‑based async processing improves throughput, smooths spikes, ensures eventual consistency, and decouples services. Common queue types include buffer queues, task queues, message queues, request queues, data‑bus queues, and replica/mirror queues.

Increasing Processing Capacity

Multithreading / Multiprocessing

Thread or process pools increase concurrency but require careful sizing based on average latency, peak concurrency, blocking rates, response time targets, and CPU cores.

Scaling

Horizontal scaling (adding machines) and vertical scaling (upgrading hardware) expand capacity. Stateless services scale easily; stateful services may require sharding or replication, with consistency mechanisms such as Paxos or Raft.

Source: kriszhang.com/high_performance

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Scalability High Availability system design high concurrency

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.