Operations 19 min read

Mastering High Availability: Redundancy & Automatic Failover in Modern Internet Architecture

This article explains how to achieve high availability in internet systems by designing redundant components and automatic failover mechanisms across layers such as load balancers, reverse proxies, microservices, middleware, databases, and messaging, illustrating concepts with diagrams of architectures, clustering, leader election, and practical tools like keepalived, Zookeeper, Redis Sentinel, and Kafka.

Su San Talks Tech

Jul 7, 2025

Mastering High Availability: Redundancy & Automatic Failover in Modern Internet Architecture

Introduction

High availability (HA) aims to ensure continuous business service from the user's perspective. It requires designing each architectural layer—application, middleware, storage—with redundancy and automatic failover, because any single component lacking HA introduces risk.

Availability Levels

Availability is often expressed in nines, e.g., 99% (≈7.3 hours downtime per year) up to 99.999% (≈0.86 seconds downtime per day). Enterprises typically target four or five nines for critical services.

Internet Architecture Overview

Most modern internet systems adopt a micro‑service architecture divided into layers: access layer, reverse‑proxy layer, gateway, site layer, basic services, storage layer, and middleware.

Access & Reverse‑Proxy Layer

High availability here relies on keepalived and LVS clusters. Two LVS instances run in active‑standby mode; keepalived monitors health via heartbeats and performs IP failover when the master fails. The same mechanism can monitor Nginx instances and notify developers via email.

Micro‑service Layer

Services register with a registry (e.g., Zookeeper or Nacos). Consumers fetch provider lists and apply load‑balancing. If a provider becomes unavailable, the registry removes it from the list, achieving automatic failover similar to keepalived.

Middleware

Zookeeper

Zookeeper provides leader election and atomic broadcast (ZAB) to avoid a single point of failure. Followers detect a missing leader heartbeat, trigger an election, and elect a new leader.

Redis

Redis can be deployed in master‑slave mode with Sentinel clusters for automatic failover, or in a sharded cluster where each shard has its own master and replicas, and Raft is used to elect a new master if needed.

Elasticsearch

Elasticsearch stores data in primary and replica shards across multiple nodes. The master node (selected via a Bully‑type algorithm) manages cluster state, while any node can handle read/write requests; writes are routed to the primary shard, then replicated.

Kafka

Kafka partitions are replicated across brokers. Followers act as cold‑standby; when a leader partition fails, a follower is elected as the new leader to continue serving requests.

Storage Layer

MySQL high availability mirrors the patterns above: master‑slave replication with keepalived for VIP failover, and sharding with multiple masters each protected by their own replicas.

Conclusion

Redundancy and automatic failover are the core principles of high availability. Most components adopt a single‑master plus multiple slaves design to balance consistency and availability. Beyond architectural HA, operational measures such as traffic isolation, rate limiting, circuit breaking, disaster recovery, and robust monitoring are essential to maintain overall system reliability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Microservices Operations failover redundancy

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.