Operations 13 min read

Mastering High Availability: 4 Essential Design Techniques for Scalable Systems

This article outlines the core high‑availability techniques—system splitting, decoupling, asynchronous processing, retry, compensation, backup, multi‑active strategies, isolation, rate limiting, circuit breaking, and degradation—providing practical guidance for designing resilient, scalable backend architectures in large‑scale internet applications.

macrozheng

Apr 14, 2022

Mastering High Availability: 4 Essential Design Techniques for Scalable Systems

Large‑scale internet architecture relies on four essential high‑availability components: high concurrency, high performance, high availability, and high scalability.

Mastering these aspects simplifies interview preparation and everyday architectural design.

Today we explore the design techniques for high availability.

1. System Splitting

Monolithic systems cause a single failure to cascade across the entire application. By splitting systems into independent microservices based on DDD principles, each service handles a specific business domain, isolates boundaries, and reduces risk propagation.

2. Decoupling

Apply the principle of high cohesion and low coupling: abstract interfaces, MVC layers, SOLID principles, and design patterns to minimize inter‑module dependencies. Example: the Open‑Closed Principle enables extensions without modifying existing code.

Spring’s AOP (Aspect‑Oriented Programming) uses dynamic proxies to intercept method calls, allowing additional logic before or after execution.

Event‑driven architecture with publish/subscribe lets new features subscribe to events without invasive code changes.

3. Asynchronous Processing

Synchronous calls block threads while waiting for responses, reducing throughput. Asynchronous execution—using thread pools, message queues, etc.—allows the main flow to continue while background tasks handle non‑real‑time actions.

Example: after an order is created, core logic returns success, while notifications, snapshots, and timeout tasks are processed asynchronously via a message queue.

4. Retry

Network jitter or thread blockage can cause RPC timeouts. Retrying requests improves user experience but must be combined with idempotency to avoid duplicate operations (e.g., bank transfers).

Idempotent solutions include pre‑check before insert, unique indexes, state machines, distributed locks, or token mechanisms.

5. Compensation

When retries are insufficient, compensation mechanisms achieve eventual consistency. Compensation can be forward (pushing failed tasks to success) or backward (reverting successful tasks to the initial state).

Note: Compensation assumes the business can tolerate short‑term data inconsistency.

Implementation methods include local tables with scheduled scans, or using message middleware with retry capabilities.

6. Backup

Disaster recovery is fundamental. For Redis, RDB provides full data sync, while AOF offers incremental log replay. Sentinel adds automatic master‑slave failover. Similar backup mechanisms exist for MySQL, Kafka, HBase, Elasticsearch, etc.

7. Multi‑Active Strategy

To survive data‑center failures (power loss, fire, earthquakes), multi‑active deployments (same‑city dual‑active, two‑region three‑center, etc.) distribute risk and maintain 24‑hour availability.

8. Isolation

Physical isolation separates low‑coupled systems into independent deployments, preventing fault propagation. Each subsystem has its own codebase, CI/CD pipeline, and communicates via RPC.

9. Rate Limiting

To protect against traffic spikes, rate limiting caps concurrent requests per system, endpoint, IP, user, or appkey. Algorithms include counter, sliding window, leaky bucket, and token bucket, implemented as single‑node or distributed solutions.

Rate limiting restricts incoming concurrency to keep the system responsive while rejecting excess traffic.

10. Circuit Breaking

Circuit breakers monitor error rates or timeouts and transition between Closed, Open, and Half‑Open states to prevent cascading failures. Popular implementations include Alibaba’s Sentinel.

11. Degradation

When resources are scarce, non‑core features (e.g., product reviews, transaction logs) are temporarily disabled to preserve core functions like order creation and payment.

Degradation strategies must be tailored to specific business scenarios and agreed upon with stakeholders.

In summary, applying these ten design techniques—splitting, decoupling, async, retry, compensation, backup, multi‑active, isolation, rate limiting, circuit breaking, and degradation—helps build highly available, resilient backend systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Microservices high availability System Design fault tolerance

Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.