Designing High‑Availability Internet Architecture: Redundancy and Automatic Failover
This article explains how to achieve high availability in internet systems by layering architecture, using redundancy and automatic failover across access, proxy, microservice, middleware, and storage components, and discusses practical techniques, common pitfalls, and operational safeguards for resilient services.
1. Introduction
High availability (HA) aims to keep business services continuously operational from the user’s perspective. Achieving HA requires designing each architectural layer—application, middleware, and data storage—with redundancy and automatic failover so that no single component can become a point of failure.
2. Architecture Layers
Typical internet systems adopt a micro‑service architecture that can be visualized in several logical layers:
Access layer : traffic enters via hardware (e.g., F5) or software load balancers (LVS).
Reverse‑proxy layer : Nginx performs URL‑based routing, rate limiting, etc.
Gateway : handles flow control, risk control, protocol conversion.
Site layer : assembles JSON responses by calling member, promotion services.
Base services : infrastructure‑level micro‑services invoked by business services.
Storage layer : databases such as MySQL or Oracle.
Middleware : Zookeeper, Redis, Elasticsearch, MQ, etc., provide caching, coordination, and messaging.
3. Access & Reverse‑Proxy Layer
Both layers achieve HA through keepalived and LVS clustering. Two LVS instances run in active‑standby mode; the master holds the virtual IP (VIP). If the master fails, keepalived detects the heartbeat loss and promotes the backup, causing the VIP to “float” to the new master. Nginx health checks are also integrated, allowing keepalived to remove unhealthy Nginx nodes from the LVS pool.
4. Micro‑service Layer (Dubbo)
Dubbo relies on a registry (Zookeeper or Nacos) for service discovery. Providers register themselves; consumers pull the provider list and apply load‑balancing. When a provider becomes unavailable, the registry’s heartbeat removes it from the list, achieving automatic failover similar to keepalived.
5. Middleware
5.1 Zookeeper
Zookeeper provides HA through a leader‑follower model. The leader handles all write requests and replicates logs to followers; a majority of followers must acknowledge writes before the leader commits. If the leader fails, followers detect the missing heartbeat and trigger a leader election using the ZAB protocol.
5.2 Redis
Redis HA can be deployed in master‑slave mode or as a sharded cluster. In master‑slave mode, Sentinel processes monitor the master via gossip, and upon master failure, they elect a new master using Raft. In a sharded cluster, each slot is assigned to a master; each master has replicas, and Raft is used to promote a replica when its master dies.
5.3 Elasticsearch
Elasticsearch achieves HA by splitting indices into primary shards and replica shards across multiple nodes. The master node (selected via a Bully‑style election) manages cluster state, while any node can serve read/write requests. Writes go to the primary shard, which then replicates to its replicas.
5.4 Kafka (MQ)
Kafka uses topic partitions replicated across brokers. Each partition has one leader and multiple followers. Followers are cold‑standby; when a leader fails, a follower is elected as the new leader, ensuring continuous message delivery.
6. Storage Layer (MySQL)
MySQL HA mirrors the LVS pattern: master‑slave replication with keepalived provides VIP failover. For large data volumes, sharding (multiple masters) is used, each with its own slaves and keepalived‑driven failover. This ensures both read/write scalability and resilience.
7. Summary & Operational Practices
The core HA principles are redundancy and automatic failover. Most components adopt a single‑master‑multiple‑slave model to simplify consistency, accepting the trade‑off of a single point of failure that is mitigated by rapid leader election.
Beyond architectural HA, production systems must handle traffic spikes, security attacks, code bugs, deployment errors, third‑party failures, and natural disasters. Recommended safeguards include traffic isolation, rate limiting, circuit breaking, risk control, graceful shutdown, blue‑green or canary releases, comprehensive monitoring with alert thresholds, unit testing, full‑chain load testing, and meticulous release tracking for quick rollback.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
