Databases 10 min read

Practices and Exploration of Disaster Recovery in Cloud‑Native Database TDSQL‑C (formerly CynosDB)

This article examines the architecture differences between traditional MySQL and the cloud‑native TDSQL‑C database, outlines MySQL disaster‑recovery deployment models, and details TDSQL‑C’s multi‑dimensional disaster‑recovery system, including its agent‑scheduler design, cross‑AZ switching challenges, and mitigation strategies.

Tencent Database Technology
Tencent Database Technology
Tencent Database Technology
Practices and Exploration of Disaster Recovery in Cloud‑Native Database TDSQL‑C (formerly CynosDB)

TDSQL‑C, a core cloud‑native database product of Tencent Cloud, provides high performance, low cost, large storage, low latency, rapid scaling, fast backup/restore, and Serverless capabilities for enterprise ToB users.

1. Cloud‑Native vs. Traditional Database Architecture

Traditional MySQL relies on Binlog replication (asynchronous, semi‑synchronous, or strong synchronous) and stores data locally, leading to heavy I/O, uncontrolled master‑slave lag, unpredictable recovery time, and poor scalability.

TDSQL‑C adopts a "log‑as‑database" concept with redo‑log based recovery, separates compute and storage, eliminates Binlog replication in favor of physical replication, and achieves stateless compute nodes.

Elasticity: Adding a read‑only node takes only 20 seconds.

Serverless: Compute nodes can be paused when idle, with recovery time under 2 seconds.

Low latency replication: Master‑slave lag stays within 20 ms, enabling global consistency.

Second‑level recovery: Snapshots at the storage layer allow GB‑scale parallel restore.

2. MySQL Disaster‑Recovery Deployment Models

Two common patterns are used:

Cross‑AZ deployment: either two AZs with three replicas (AZ1 holds two, AZ2 holds one) or three AZs with one replica per AZ.

Cross‑Region deployment: a disaster‑recovery instance in another region, typically read‑only, requiring manual failover.

Characteristics include multi‑AZ/Region placement, logical or physical log synchronization, async or semi‑sync primary‑replica links, and consistency ensured by external systems or built‑in protocols.

3. TDSQL‑C Multi‑Dimensional Disaster‑Recovery System

Components:

Agent co‑located with each instance collects status (replication health, process health, hardware health) and reports to Scheduler.

Scheduler receives heartbeats, decides whether to trigger a switch (in‑AZ or cross‑AZ), and uses ZooKeeper (ZK) for leader election and high availability.

Failover steps:

When the primary AZ fails, ZK triggers a leader change and Scheduler re‑elects a new primary.

Scheduler detects missing heartbeats from the failed AZ’s Agent.

Scheduler double‑checks the Agent’s lease information.

After lease timeout, Scheduler initiates the failover.

Challenges and Mitigation Strategies

Two main risks are double‑write and mistaken switch:

Prevent double‑write: ZK degrades to read‑only, Agents fail to renew leases, Scheduler exits, and expired leases force the database into read‑only mode.

Prevent mistaken switch: Introduce a third‑party lease system for multi‑level safety, and an external probing system (both long‑link internal probes and short‑link external probes) to verify node health before switching.

Scheduler combines lease status, probe results, and instance health to make informed failover decisions, minimizing the chance of erroneous switches.

In summary, TDSQL‑C’s architecture and its sophisticated disaster‑recovery framework provide rapid, reliable, and low‑impact failover capabilities, positioning it as a next‑generation cloud‑native database solution.

serverlessMySQLdisaster recoveryTDSQL-Ccloud-native database
Tencent Database Technology
Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.