Backend Development 12 min read

Mastering High Availability: 10 Essential Design Techniques for Scalable Systems

This article explains ten core techniques—system splitting, decoupling, asynchrony, retry, compensation, backup, multi‑active strategies, isolation, rate limiting, circuit breaking, and degradation—that together enable robust, high‑availability architectures for modern backend services.

Su San Talks Tech

Jul 6, 2024

Mastering High Availability: 10 Essential Design Techniques for Scalable Systems

1. System Splitting

Large monolithic applications become fragile under load; a single failure can cascade and bring down the entire system. Splitting the system into smaller, bounded contexts—often realized as microservices—isolates failures and allows independent scaling.

2. Decoupling

Applying the principle of high cohesion and low coupling reduces the impact of changes. Techniques such as interface abstraction, MVC layering, SOLID principles, AOP, and event‑driven publish/subscribe keep modules independent and prevent a single modification from breaking the whole system.

3. Asynchrony

Synchronous calls block a thread while waiting for a response, degrading throughput. Asynchronous processing—using thread pools, message queues, or other async mechanisms—allows the caller to continue work while the background task completes.

4. Retry

Network glitches or resource contention can cause RPC calls to fail. Retrying the request improves user experience, but must be combined with idempotency to avoid duplicate side effects, especially in financial operations.

5. Compensation

When retries are insufficient, compensation techniques achieve eventual consistency. Forward compensation pushes partially successful operations toward success, while reverse compensation rolls back successful steps to the original state.

Note: Compensation assumes the business can tolerate short‑term data inconsistency.

6. Backup

Data loss due to server failure is unacceptable. Mechanisms such as Redis RDB (full sync) and AOF (incremental log replay), along with sentinel‑based automatic failover, provide durable persistence and rapid recovery.

7. Multi‑Active Strategy

To survive catastrophic events (e.g., data‑center outages), deploying active instances across multiple locations—such as same‑city dual‑active, two‑region three‑center, or multi‑region setups—ensures continuous service availability.

8. Isolation

Physical isolation separates low‑coupling subsystems into independent deployments, each with its own codebase and release pipeline. Failures remain contained, and microservice‑style RPC calls enable clear boundaries.

9. Rate Limiting

When traffic spikes exceed system capacity, rate limiting discards excess requests to protect stability. Limits can be applied per system, per endpoint, per user, or per API key, using algorithms such as counters, sliding windows, leaky bucket, or token bucket.

10. Circuit Breaker

Circuit breakers monitor failure rates of downstream services. In the closed state, requests flow normally; upon reaching a failure threshold, the breaker opens and rejects calls. After a cooldown, it moves to half‑open to test recovery before closing again.

11. Degradation

During extreme load, non‑essential features (e.g., product reviews, transaction logs) can be temporarily disabled, preserving core functions such as order creation and payment. Degradation strategies must be coordinated with business stakeholders.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems High Availability System Design fault tolerance

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.