Operations 19 min read

Designing High‑Availability Internet Architecture: Redundancy and Automatic Failover

This article explains how to achieve high availability in internet systems by layering architecture, using redundancy and automatic failover across access, proxy, microservice, middleware, and storage components, and discusses practical techniques, common pitfalls, and operational safeguards for resilient services.

dbaplus Community

Oct 8, 2022

Designing High‑Availability Internet Architecture: Redundancy and Automatic Failover

1. Introduction

High availability (HA) aims to keep business services continuously operational from the user’s perspective. Achieving HA requires designing each architectural layer—application, middleware, and data storage—with redundancy and automatic failover so that no single component can become a point of failure.

2. Architecture Layers

Typical internet systems adopt a micro‑service architecture that can be visualized in several logical layers:

Access layer : traffic enters via hardware (e.g., F5) or software load balancers (LVS).

Reverse‑proxy layer : Nginx performs URL‑based routing, rate limiting, etc.

Gateway : handles flow control, risk control, protocol conversion.

Site layer : assembles JSON responses by calling member, promotion services.

Base services : infrastructure‑level micro‑services invoked by business services.

Storage layer : databases such as MySQL or Oracle.

Middleware : Zookeeper, Redis, Elasticsearch, MQ, etc., provide caching, coordination, and messaging.

3. Access & Reverse‑Proxy Layer

Both layers achieve HA through keepalived and LVS clustering. Two LVS instances run in active‑standby mode; the master holds the virtual IP (VIP). If the master fails, keepalived detects the heartbeat loss and promotes the backup, causing the VIP to “float” to the new master. Nginx health checks are also integrated, allowing keepalived to remove unhealthy Nginx nodes from the LVS pool.

4. Micro‑service Layer (Dubbo)

Dubbo relies on a registry (Zookeeper or Nacos) for service discovery. Providers register themselves; consumers pull the provider list and apply load‑balancing. When a provider becomes unavailable, the registry’s heartbeat removes it from the list, achieving automatic failover similar to keepalived.

5. Middleware

5.1 Zookeeper

Zookeeper provides HA through a leader‑follower model. The leader handles all write requests and replicates logs to followers; a majority of followers must acknowledge writes before the leader commits. If the leader fails, followers detect the missing heartbeat and trigger a leader election using the ZAB protocol.

5.2 Redis

Redis HA can be deployed in master‑slave mode or as a sharded cluster. In master‑slave mode, Sentinel processes monitor the master via gossip, and upon master failure, they elect a new master using Raft. In a sharded cluster, each slot is assigned to a master; each master has replicas, and Raft is used to promote a replica when its master dies.

5.3 Elasticsearch

Elasticsearch achieves HA by splitting indices into primary shards and replica shards across multiple nodes. The master node (selected via a Bully‑style election) manages cluster state, while any node can serve read/write requests. Writes go to the primary shard, which then replicates to its replicas.

5.4 Kafka (MQ)

Kafka uses topic partitions replicated across brokers. Each partition has one leader and multiple followers. Followers are cold‑standby; when a leader fails, a follower is elected as the new leader, ensuring continuous message delivery.

6. Storage Layer (MySQL)

MySQL HA mirrors the LVS pattern: master‑slave replication with keepalived provides VIP failover. For large data volumes, sharding (multiple masters) is used, each with its own slaves and keepalived‑driven failover. This ensures both read/write scalability and resilience.

7. Summary & Operational Practices

The core HA principles are redundancy and automatic failover. Most components adopt a single‑master‑multiple‑slave model to simplify consistency, accepting the trade‑off of a single point of failure that is mitigated by rapid leader election.

Beyond architectural HA, production systems must handle traffic spikes, security attacks, code bugs, deployment errors, third‑party failures, and natural disasters. Recommended safeguards include traffic isolation, rate limiting, circuit breaking, risk control, graceful shutdown, blue‑green or canary releases, comprehensive monitoring with alert thresholds, unit testing, full‑chain load testing, and meticulous release tracking for quick rollback.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Microservices Operations load balancing redundancy Automatic Failover

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.