Operations 3 min read

Core Principles of High‑Availability Architecture Design

These core principles—minimal dependency, weak dependency, distribution, rate limiting, degradable design, balanced risk, fault prevention and isolation, no single point of failure, self‑protection, automatic failover, and retry/idempotency/compensation—guide the design of highly available systems by reducing risk, ensuring redundancy, and protecting services at all layers.

Cognitive Technology Team

May 16, 2024

Core Principles of High‑Availability Architecture Design

The following core principles are essential for designing highly available architectures.

Minimal Dependency Principle: Avoid dependencies whenever possible; the fewer the dependencies, the better.

Weak Dependency Principle: When a dependency is unavoidable, keep it as weak as possible to minimize impact.

Distribution Principle: Do not place all "eggs" in one basket; spreading risk reduces the chance of total failure, while balancing cost considerations.

Rate‑Limiting Principle: Apply traffic throttling based on business load to protect both your own services and downstream dependencies, and consider converting lossy throttling strategies into lossless ones.

Degradable Design Principle: Design each module to degrade gracefully, automatically or manually switching to a backup or fallback solution when a component fails.

Balanced Principle: Distribute risk evenly across the system to avoid concentration of failure points.

Fault Prevention and Isolation Principle: Control risks so they do not spread or amplify throughout the system.

No Single‑Point Principle: Ensure redundancy or alternative versions exist so that there is always an alternative path.

Self‑Protection Principle: Protect a portion of the system by sacrificing a limited part; upper‑level services should protect lower‑level services, and lower‑level services should not fully trust upper layers, implementing their own rate‑limiting and degradation safeguards.

This principle was summarized by Mr. Zheng Yanqiang from Xiaomi during the post‑mortem of the Mi Home 616 incident, emphasizing that upper services must shield lower services when abnormal traffic is detected, while lower services must also protect themselves.

Automatic Failover Principle: Upstream and downstream hosts can automatically detect and remove faulty hosts, enabling rapid service recovery.

Retry, Idempotency, Compensation, BCP Principle: Implement retry mechanisms, ensure operations are idempotent, provide compensation logic, and maintain business continuity plans to handle failures gracefully.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture Operations High Availability System Design Reliability

Written by

Cognitive Technology Team

Cognitive Technology Team regularly delivers the latest IT news, original content, programming tutorials and experience sharing, with daily perks awaiting you.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.