Why Multi‑Datacenter Architecture Is Essential for High‑Availability Services
The article explains how multi‑datacenter architectures prevent total service loss, improve latency by placing services near users, and balance the CAP trade‑offs through models like AC, CP, and AP, while outlining practical design, sharding, monitoring, and failover strategies for large‑scale backend systems.
Reasons for Multi‑Datacenter Architecture
When a single datacenter crashes, power loss or maintenance can cause irreversible data loss and make all services unavailable. To keep services close to users, LiZhi FM connects southern users to southern datacenters, northern users to northern datacenters, and overseas users to overseas datacenters. Without cross‑datacenter connectivity, data transmission and real‑time performance suffer due to network isolation.
CAP Theory Overview
CAP states that a distributed system can only simultaneously guarantee two of the three properties: Consistency, Availability, and Partition Tolerance. Different models sacrifice one property:
AC model : High availability and strong consistency, sacrificing partition tolerance (e.g., MySQL Cluster with two‑phase commit).
CP model : Strong consistency and partition tolerance, sacrificing availability (e.g., Redis clusters where a node failure makes its data inaccessible).
AP model : High availability and partition tolerance, sacrificing strong consistency (e.g., Cassandra where data remains accessible despite node failures).
Internet services often prefer AP or eventual consistency because strict consistency is not critical for user‑generated content.
BASE Model
Derived from the CAP discussion, BASE stands for Basically Available, Soft state, and Eventual consistency. It accepts temporary inconsistency, allowing the system to remain operational during failures and to converge to a consistent state later.
System Business Research
LiZhi FM’s architecture consists of a client‑side proxy, application servers, a data center, and storage layers (Redis, MySQL, Memcached). Cross‑datacenter synchronization is required for large media assets and user‑generated content; any lag leads to visible errors for users.
Architecture Design
The service operates two IDC datacenters: a high‑speed dedicated line (green) and a cost‑effective public network (red). Smart DNS directs users to the nearest datacenter. Each region has a master‑slave setup: reads are served locally, writes go to the master and are asynchronously replicated to the other datacenter. A data‑access API abstracts synchronization, and failover logic switches traffic to the standby datacenter when the master becomes unresponsive.
Best Practices
Data sharding: start with vertical sharding (by business domain) and move to horizontal sharding (hash‑based ID partitioning) as volume grows.
Asynchronous interfaces improve responsiveness but increase programming complexity; LiZhi FM provides simple async APIs.
Implement test‑driven development and continuous monitoring of logs, CPU, memory, disk, network, and I/O to detect bottlenecks early.
Use idempotent operations and handle three possible states of distributed calls: success, failure, timeout.
Design for rapid scaling: add nodes with minimal configuration changes and provide one‑click recovery procedures.
Monitoring includes real‑time alerts via email, IM, or SMS, and regular reports (daily, weekly, monthly) to guide capacity planning.
Source: geek.csdn (original article by Liu Yaohua, LiZhi FM architect)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
