Why Multi‑Datacenter Architecture Is Essential for High‑Availability Services

The article explains how multi‑datacenter architectures prevent total service loss, improve latency by placing services near users, and balance the CAP trade‑offs through models like AC, CP, and AP, while outlining practical design, sharding, monitoring, and failover strategies for large‑scale backend systems.

21CTO
21CTO
21CTO
Why Multi‑Datacenter Architecture Is Essential for High‑Availability Services

Reasons for Multi‑Datacenter Architecture

When a single datacenter crashes, power loss or maintenance can cause irreversible data loss and make all services unavailable. To keep services close to users, LiZhi FM connects southern users to southern datacenters, northern users to northern datacenters, and overseas users to overseas datacenters. Without cross‑datacenter connectivity, data transmission and real‑time performance suffer due to network isolation.

CAP Theory Overview

CAP states that a distributed system can only simultaneously guarantee two of the three properties: Consistency, Availability, and Partition Tolerance. Different models sacrifice one property:

AC model : High availability and strong consistency, sacrificing partition tolerance (e.g., MySQL Cluster with two‑phase commit).

CP model : Strong consistency and partition tolerance, sacrificing availability (e.g., Redis clusters where a node failure makes its data inaccessible).

AP model : High availability and partition tolerance, sacrificing strong consistency (e.g., Cassandra where data remains accessible despite node failures).

Internet services often prefer AP or eventual consistency because strict consistency is not critical for user‑generated content.

BASE Model

Derived from the CAP discussion, BASE stands for Basically Available, Soft state, and Eventual consistency. It accepts temporary inconsistency, allowing the system to remain operational during failures and to converge to a consistent state later.

System Business Research

LiZhi FM’s architecture consists of a client‑side proxy, application servers, a data center, and storage layers (Redis, MySQL, Memcached). Cross‑datacenter synchronization is required for large media assets and user‑generated content; any lag leads to visible errors for users.

Architecture Design

The service operates two IDC datacenters: a high‑speed dedicated line (green) and a cost‑effective public network (red). Smart DNS directs users to the nearest datacenter. Each region has a master‑slave setup: reads are served locally, writes go to the master and are asynchronously replicated to the other datacenter. A data‑access API abstracts synchronization, and failover logic switches traffic to the standby datacenter when the master becomes unresponsive.

Best Practices

Data sharding: start with vertical sharding (by business domain) and move to horizontal sharding (hash‑based ID partitioning) as volume grows.

Asynchronous interfaces improve responsiveness but increase programming complexity; LiZhi FM provides simple async APIs.

Implement test‑driven development and continuous monitoring of logs, CPU, memory, disk, network, and I/O to detect bottlenecks early.

Use idempotent operations and handle three possible states of distributed calls: success, failure, timeout.

Design for rapid scaling: add nodes with minimal configuration changes and provide one‑click recovery procedures.

Monitoring includes real‑time alerts via email, IM, or SMS, and regular reports (daily, weekly, monthly) to guide capacity planning.

Source: geek.csdn (original article by Liu Yaohua, LiZhi FM architect)
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsCAP theoremhigh availabilityData Consistencymulti‑datacenter
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.