Mastering High Availability: 4 Essential Design Techniques for Scalable Systems
This article outlines the core high‑availability techniques—system splitting, decoupling, asynchronous processing, retry, compensation, backup, multi‑active strategies, isolation, rate limiting, circuit breaking, and degradation—providing practical guidance for designing resilient, scalable backend architectures in large‑scale internet applications.
Large‑scale internet architecture relies on four essential high‑availability components: high concurrency, high performance, high availability, and high scalability.
Mastering these aspects simplifies interview preparation and everyday architectural design.
Today we explore the design techniques for high availability.
1. System Splitting
Monolithic systems cause a single failure to cascade across the entire application. By splitting systems into independent microservices based on DDD principles, each service handles a specific business domain, isolates boundaries, and reduces risk propagation.
2. Decoupling
Apply the principle of high cohesion and low coupling: abstract interfaces, MVC layers, SOLID principles, and design patterns to minimize inter‑module dependencies. Example: the Open‑Closed Principle enables extensions without modifying existing code.
Spring’s AOP (Aspect‑Oriented Programming) uses dynamic proxies to intercept method calls, allowing additional logic before or after execution.
Event‑driven architecture with publish/subscribe lets new features subscribe to events without invasive code changes.
3. Asynchronous Processing
Synchronous calls block threads while waiting for responses, reducing throughput. Asynchronous execution—using thread pools, message queues, etc.—allows the main flow to continue while background tasks handle non‑real‑time actions.
Example: after an order is created, core logic returns success, while notifications, snapshots, and timeout tasks are processed asynchronously via a message queue.
4. Retry
Network jitter or thread blockage can cause RPC timeouts. Retrying requests improves user experience but must be combined with idempotency to avoid duplicate operations (e.g., bank transfers).
Idempotent solutions include pre‑check before insert, unique indexes, state machines, distributed locks, or token mechanisms.
5. Compensation
When retries are insufficient, compensation mechanisms achieve eventual consistency. Compensation can be forward (pushing failed tasks to success) or backward (reverting successful tasks to the initial state).
Note: Compensation assumes the business can tolerate short‑term data inconsistency.
Implementation methods include local tables with scheduled scans, or using message middleware with retry capabilities.
6. Backup
Disaster recovery is fundamental. For Redis, RDB provides full data sync, while AOF offers incremental log replay. Sentinel adds automatic master‑slave failover. Similar backup mechanisms exist for MySQL, Kafka, HBase, Elasticsearch, etc.
7. Multi‑Active Strategy
To survive data‑center failures (power loss, fire, earthquakes), multi‑active deployments (same‑city dual‑active, two‑region three‑center, etc.) distribute risk and maintain 24‑hour availability.
8. Isolation
Physical isolation separates low‑coupled systems into independent deployments, preventing fault propagation. Each subsystem has its own codebase, CI/CD pipeline, and communicates via RPC.
9. Rate Limiting
To protect against traffic spikes, rate limiting caps concurrent requests per system, endpoint, IP, user, or appkey. Algorithms include counter, sliding window, leaky bucket, and token bucket, implemented as single‑node or distributed solutions.
Rate limiting restricts incoming concurrency to keep the system responsive while rejecting excess traffic.
10. Circuit Breaking
Circuit breakers monitor error rates or timeouts and transition between Closed, Open, and Half‑Open states to prevent cascading failures. Popular implementations include Alibaba’s Sentinel.
11. Degradation
When resources are scarce, non‑core features (e.g., product reviews, transaction logs) are temporarily disabled to preserve core functions like order creation and payment.
Degradation strategies must be tailored to specific business scenarios and agreed upon with stakeholders.
In summary, applying these ten design techniques—splitting, decoupling, async, retry, compensation, backup, multi‑active, isolation, rate limiting, circuit breaking, and degradation—helps build highly available, resilient backend systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
