How to Achieve High Availability: Metrics, Redundancy, and Circuit‑Breaker Strategies
This article explains system availability metrics, the inevitability of faults in distributed systems, and practical high‑availability designs such as redundancy, Zookeeper and Eureka clustering, and circuit‑breaker patterns to keep services reliably operational.
System Availability Metric
System availability is the ratio of uptime to total running time, expressed as: Availability = MTTF/(MTTF+MTTR) MTTF (Mean Time To Failure) measures the average time a system runs before a failure, while MTTR (Mean Time To Recovery) measures the average downtime after a failure. Industry standards often require at least two nines (99%) availability, with four nines (99.99%) being ideal.
Fault Inevitability in Distributed Systems
High availability means services must remain usable at all times, yet failures are unavoidable, especially in large‑scale distributed environments where complex inter‑module dependencies can cause cascade or domino effects.
Redundancy Design
Eliminating single points of failure requires redundant deployment across multiple physical locations. Common redundancy patterns include master‑slave and peer‑to‑peer designs, with variations such as one‑master‑multiple‑slaves or multi‑master configurations.
Redundancy introduces consistency challenges; according to the CAP theorem, a system cannot simultaneously guarantee consistency, availability, and partition tolerance. Strong consistency often sacrifices availability, as illustrated by Zookeeper.
Zookeeper Cluster Roles
Leader : Handles all client write requests and coordinates the cluster.
Follower : Provides read services and forwards write requests to the Leader.
Observer : Similar to Followers but does not participate in Leader election, improving read throughput.
When the Leader fails, the cluster loses write capability until a new Leader is elected, demonstrating a trade‑off between strong consistency and high availability.
Eureka Peer‑to‑Peer Design
Eureka consists of Eureka Clients (service instances) and Eureka Servers (registry). Multiple Eureka Servers are deployed for high availability, synchronizing data asynchronously via HTTP. This yields eventual consistency, sacrificing strong consistency for continuous service discovery.
Circuit‑Breaker Design
To prevent downstream failures from cascading upstream, a circuit‑breaker pattern with three states—Closed, Open, and Half‑Open—is employed.
Closed : Calls proceed normally; a failure counter tracks error rates and opens the circuit if a threshold is exceeded.
Open : Calls are blocked and fallback logic is executed; after a timeout, the circuit moves to Half‑Open.
Half‑Open : A limited number of trial calls are allowed; if they succeed, the circuit closes, otherwise it reopens.
Using this pattern protects upstream services, conserves resources, and improves overall system availability.
Summary of High‑Availability Practices
Beyond redundancy and circuit breakers, additional techniques include rate limiting, degradation, stateless design, idempotent operations, retry mechanisms, interface caching, real‑time monitoring and metrics, and regular preventive maintenance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Huawei Cloud Developer Alliance
The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
