Mastering High Availability: 10 Essential Design Techniques for Scalable Systems
This article outlines ten practical techniques—including system splitting, decoupling, asynchronous processing, retry strategies, compensation, backup, multi‑active deployment, isolation, rate limiting, circuit breaking, and degradation—to help engineers design highly available, resilient architectures for large‑scale internet applications.
Hello, I am San You.
Large‑scale internet architecture design relies on a four‑piece combination: high concurrency, high performance, high availability, and high scalability.
If you master these four aspects, tackling big‑company interviews and everyday architectural design becomes straightforward.
Today we focus on the design tricks for high availability.
1. System Splitting
When a monolithic system grows, a single mistake can cascade into a disaster. Traditional monoliths (e.g., e‑commerce platforms where membership, product, order, logistics, marketing are all in one) suffer from whole‑system failures during traffic spikes.
Therefore, system splitting becomes a common solution, leading to microservice architectures that separate business domains according to DDD principles, isolate boundaries, and reduce risk propagation.
2. Decoupling
The principle of “high cohesion, low coupling” applies from interface abstraction and MVC layers to SOLID principles and the 23 design patterns. Reducing coupling prevents a change in one module from affecting the whole system.
For example, the Open‑Closed Principle keeps extensions open and modifications closed. Spring’s AOP (Aspect‑Oriented Programming) uses dynamic proxies to intercept method calls, allowing extra logic before or after execution.
Event mechanisms (publish/subscribe) also enable non‑intrusive extensions: new features subscribe to events without modifying existing code.
3. Asynchronous Processing
Synchronous calls block the thread until a response arrives, reducing efficiency. Asynchronous processing (e.g., thread pools, message queues) lets the thread continue with other work while the response is pending.
4. Retry
Network jitter or thread blockage can cause RPC timeouts. Retrying requests improves user experience, but blind retries may cause issues (e.g., duplicate bank transfers). Retries should be combined with idempotency checks.
Query before insert to avoid duplicates
Add unique indexes
Use a “dead‑letter” table
Introduce state machines (e.g., order status “paid” with conditional updates)
Apply distributed locks
Use token mechanisms to ensure a request is processed only once
5. Compensation
When retries are insufficient, compensation techniques achieve eventual consistency. Compensation can be forward (completing partially failed distributed transactions) or reverse (rolling back to the initial state).
Note: Compensation assumes the business can tolerate short‑term data inconsistency.
Implementation examples include local tables with scheduled scans, or simple message‑queue‑driven compensation tasks that leverage MQ retry mechanisms.
6. Backup
Any server may crash, risking data loss. Disaster‑recovery backup is a fundamental capability. For Redis, RDB provides full data sync, while AOF offers incremental log replay. Sentinel adds automatic master‑slave failover.
Other storage systems (MySQL, Kafka, HBase, Elasticsearch) also provide backup mechanisms to prevent data loss.
7. Multi‑Active Strategy
Beyond backup, multi‑active deployments (e.g., same‑city dual‑active, two‑region three‑center, three‑region five‑center, cross‑region dual‑active) reduce risk from catastrophic events like power outages or natural disasters.
8. Isolation
Physical isolation separates low‑coupling systems into independent deployments, preventing faults from cascading. Each subsystem has its own codebase, development, and release pipeline, communicating via RPC.
9. Rate Limiting
During traffic spikes, unrestricted requests can overwhelm CPU, memory, and load. Rate limiting caps concurrent requests, ensuring the system remains responsive for a subset of users while discarding excess traffic.
Limit the number of concurrent requests reaching the system to maintain overall availability.
Rate limiting can be implemented as single‑machine (in‑memory counters) or distributed (cluster‑wide coordination). It supports dimensions such as total system QPS, per‑API limits, per‑IP/user limits, and per‑appkey rules.
Counter‑based limiting
Sliding‑window limiting
Leaky‑bucket limiting
Token‑bucket limiting
10. Circuit Breaking
Circuit breakers detect unstable resources (high latency or error rates) and quickly fail subsequent calls, preventing cascading failures. They have three states: Closed (normal), Open (reject requests), and Half‑Open (test recovery).
11. Degradation
Degradation temporarily disables non‑core features (e.g., product reviews, transaction logs) during peak load, preserving critical functions like order creation and payment.
Different businesses adopt varied degradation strategies, requiring collaboration with product owners to define acceptable trade‑offs.
In summary, degradation protects core system availability by shutting down optional services when resources are constrained.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sanyou's Java Diary
Passionate about technology, though not great at solving problems; eager to share, never tire of learning!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
