Operations 20 min read

High Availability Design in Internet Architecture: Redundancy and Automatic Failover

This article explains the principles of high availability in internet systems, covering redundancy, automatic failover, availability metrics, and detailed HA designs for each architectural layer such as load balancers, microservices, middleware, and databases.

IT Services Circle
IT Services Circle
IT Services Circle
High Availability Design in Internet Architecture: Redundancy and Automatic Failover

High availability (HA) aims to ensure continuous business service from the user's perspective by designing redundant and fault‑tolerant architectures. A layered approach splits a large IT system into application, middleware, and storage layers, each further divided into fine‑grained components that must all be HA‑designed.

Availability Levels

Availability Level

System Uptime %

Downtime / Year

Downtime / Month

Downtime / Week

Downtime / Day

Unavailable

90%

36.5 days

73 hours

16.8 hours

144 minutes

Basic

99%

87.6 hours

7.3 hours

1.68 hours

14.4 minutes

Higher

99.9%

8.76 hours

43.8 minutes

10.1 minutes

1.44 minutes

High

99.99%

52.56 minutes

4.38 minutes

1.01 seconds

8.64 seconds

Very High

99.999%

5.26 minutes

26.28 seconds

6.06 seconds

0.86 seconds

Typical large‑scale internet services target at least four 9s (99.99% uptime), while mission‑critical systems may require five 9s.

Internet Architecture Overview

Most modern internet systems adopt a micro‑service architecture consisting of the following layers:

Access layer – usually F5 hardware or LVS software handling all inbound traffic.

Reverse‑proxy layer – Nginx for URL routing, rate limiting, etc.

Gateway – flow control, risk control, protocol conversion.

Site layer – aggregates basic services (member, promotion) and returns JSON to clients.

Base services – infrastructure‑level micro‑services used by upper layers.

Storage layer – databases such as MySQL, Oracle.

Middleware – Zookeeper, Redis, Elasticsearch, MQ, etc.

Each component must be made highly available.

Access & Reverse‑Proxy Layer

Both layers achieve HA through keepalived and LVS in a master‑backup configuration. The master holds the virtual IP (VIP); if it fails, keepalived detects the heartbeat loss and promotes the backup to master, causing the VIP to “float” to the backup node. Keepalived can also monitor Nginx health and remove failed instances from the LVS pool.

Micro‑service Layer (Dubbo Example)

Dubbo providers register themselves to a registry (e.g., Zookeeper or Nacos). Consumers subscribe to the registry and obtain a list of available providers. If a provider becomes unavailable, the registry’s heartbeat mechanism removes it from the list, enabling automatic failover similar to keepalived.

Middleware

Zookeeper

Zookeeper provides HA via a leader‑follower model. The single leader handles transaction ordering, while followers replicate data. If the leader fails, followers hold an election (using the ZAB protocol) to select a new leader, eliminating the single‑point‑of‑failure.

Redis

Redis HA can be deployed in master‑slave mode with Sentinel as the arbitrator. Sentinel clusters use gossip to detect master failures and Raft‑based elections to promote a slave to master. In cluster (sharding) mode, data is split into slots across multiple masters, each with its own replicas; Raft is used to elect a new master if one fails.

Elasticsearch

ES stores data in primary and replica shards across multiple nodes. A dedicated master node manages cluster state and shard allocation. If the master fails, other nodes elect a new master (using a Bully‑style algorithm). Any node can serve read/write requests, routing writes to the appropriate primary shard.

Message Queue (Kafka)

Kafka achieves HA by replicating each partition across multiple brokers. One replica acts as the leader; followers stay in cold‑standby. If the leader broker crashes, a follower is elected as the new leader, ensuring continuous message delivery.

Storage Layer (MySQL Example)

MySQL HA follows the same master‑slave pattern, often protected by keepalived and a VIP. For large data volumes, sharding (multiple masters) is used, each with its own slaves, and the same HA mechanisms apply.

Beyond HA – Operational Practices

Even with HA at the component level, systems must handle traffic spikes, DDoS attacks, code bugs, deployment issues, third‑party failures, and natural disasters. Practices such as isolation, rate limiting, circuit breaking, risk control, graceful degradation, comprehensive monitoring, automated alerts, unit testing, full‑link stress testing, and rapid rollback are essential.

Conclusion

The core ideas of high availability are redundancy and automatic failover. Most components adopt a single‑master plus multiple slaves design because maintaining consistency across multiple masters is complex. Combining HA with robust operational safeguards yields truly reliable internet services.

Distributed SystemsmicroservicesOperationsHigh AvailabilityLoad Balancingfailoverredundancy
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.