Operations 9 min read

How to Achieve High Availability: Metrics, Redundancy, and Circuit‑Breaker Strategies

This article explains system availability metrics, the inevitability of faults in distributed systems, and practical high‑availability designs such as redundancy, Zookeeper and Eureka clustering, and circuit‑breaker patterns to keep services reliably operational.

Huawei Cloud Developer Alliance

Jan 19, 2021

How to Achieve High Availability: Metrics, Redundancy, and Circuit‑Breaker Strategies

System Availability Metric

System availability is the ratio of uptime to total running time, expressed as: Availability = MTTF/(MTTF+MTTR) MTTF (Mean Time To Failure) measures the average time a system runs before a failure, while MTTR (Mean Time To Recovery) measures the average downtime after a failure. Industry standards often require at least two nines (99%) availability, with four nines (99.99%) being ideal.

Fault Inevitability in Distributed Systems

High availability means services must remain usable at all times, yet failures are unavoidable, especially in large‑scale distributed environments where complex inter‑module dependencies can cause cascade or domino effects.

Redundancy Design

Eliminating single points of failure requires redundant deployment across multiple physical locations. Common redundancy patterns include master‑slave and peer‑to‑peer designs, with variations such as one‑master‑multiple‑slaves or multi‑master configurations.

Redundancy introduces consistency challenges; according to the CAP theorem, a system cannot simultaneously guarantee consistency, availability, and partition tolerance. Strong consistency often sacrifices availability, as illustrated by Zookeeper.

Zookeeper Cluster Roles

Leader : Handles all client write requests and coordinates the cluster.

Follower : Provides read services and forwards write requests to the Leader.

Observer : Similar to Followers but does not participate in Leader election, improving read throughput.

When the Leader fails, the cluster loses write capability until a new Leader is elected, demonstrating a trade‑off between strong consistency and high availability.

Eureka Peer‑to‑Peer Design

Eureka consists of Eureka Clients (service instances) and Eureka Servers (registry). Multiple Eureka Servers are deployed for high availability, synchronizing data asynchronously via HTTP. This yields eventual consistency, sacrificing strong consistency for continuous service discovery.

Circuit‑Breaker Design

To prevent downstream failures from cascading upstream, a circuit‑breaker pattern with three states—Closed, Open, and Half‑Open—is employed.

Closed : Calls proceed normally; a failure counter tracks error rates and opens the circuit if a threshold is exceeded.

Open : Calls are blocked and fallback logic is executed; after a timeout, the circuit moves to Half‑Open.

Half‑Open : A limited number of trial calls are allowed; if they succeed, the circuit closes, otherwise it reopens.

Using this pattern protects upstream services, conserves resources, and improves overall system availability.

Summary of High‑Availability Practices

Beyond redundancy and circuit breakers, additional techniques include rate limiting, degradation, stateless design, idempotent operations, retry mechanisms, interface caching, real‑time monitoring and metrics, and regular preventive maintenance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems High Availability system reliability circuit breaker redundancy

Written by

Huawei Cloud Developer Alliance

The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.