Mastering High Availability: Redundancy & Automatic Failover in Modern Internet Architecture
This article explains how to achieve high availability in internet systems by designing redundant components and automatic failover mechanisms across layers such as load balancers, reverse proxies, microservices, middleware, databases, and messaging, illustrating concepts with diagrams of architectures, clustering, leader election, and practical tools like keepalived, Zookeeper, Redis Sentinel, and Kafka.
Introduction
High availability (HA) aims to ensure continuous business service from the user's perspective. It requires designing each architectural layer—application, middleware, storage—with redundancy and automatic failover, because any single component lacking HA introduces risk.
Availability Levels
Availability is often expressed in nines, e.g., 99% (≈7.3 hours downtime per year) up to 99.999% (≈0.86 seconds downtime per day). Enterprises typically target four or five nines for critical services.
Internet Architecture Overview
Most modern internet systems adopt a micro‑service architecture divided into layers: access layer, reverse‑proxy layer, gateway, site layer, basic services, storage layer, and middleware.
Access & Reverse‑Proxy Layer
High availability here relies on keepalived and LVS clusters. Two LVS instances run in active‑standby mode; keepalived monitors health via heartbeats and performs IP failover when the master fails. The same mechanism can monitor Nginx instances and notify developers via email.
Micro‑service Layer
Services register with a registry (e.g., Zookeeper or Nacos). Consumers fetch provider lists and apply load‑balancing. If a provider becomes unavailable, the registry removes it from the list, achieving automatic failover similar to keepalived.
Middleware
Zookeeper
Zookeeper provides leader election and atomic broadcast (ZAB) to avoid a single point of failure. Followers detect a missing leader heartbeat, trigger an election, and elect a new leader.
Redis
Redis can be deployed in master‑slave mode with Sentinel clusters for automatic failover, or in a sharded cluster where each shard has its own master and replicas, and Raft is used to elect a new master if needed.
Elasticsearch
Elasticsearch stores data in primary and replica shards across multiple nodes. The master node (selected via a Bully‑type algorithm) manages cluster state, while any node can handle read/write requests; writes are routed to the primary shard, then replicated.
Kafka
Kafka partitions are replicated across brokers. Followers act as cold‑standby; when a leader partition fails, a follower is elected as the new leader to continue serving requests.
Storage Layer
MySQL high availability mirrors the patterns above: master‑slave replication with keepalived for VIP failover, and sharding with multiple masters each protected by their own replicas.
Conclusion
Redundancy and automatic failover are the core principles of high availability. Most components adopt a single‑master plus multiple slaves design to balance consistency and availability. Beyond architectural HA, operational measures such as traffic isolation, rate limiting, circuit breaking, disaster recovery, and robust monitoring are essential to maintain overall system reliability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
