Cloud Computing 11 min read

CynosDB Compute‑Intelligent Storage Architecture and High‑Availability Overview

The talk detailed CynosDB’s compute‑intelligent storage and multi‑read architecture, explaining TXSQL, Space Manager, DBStore, and Atlas’s two‑layer distributed storage with three‑replica nodes, high‑availability recovery, snapshot and migration features, and advanced data routing and I/O protocols for robust, fault‑tolerant database services.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
CynosDB Compute‑Intelligent Storage Architecture and High‑Availability Overview

On March 16, Tencent Cloud + Community hosted a CynosDB technical exchange in Beijing, where senior database engineer Fan Wei presented a detailed walkthrough of CynosDB’s compute‑intelligent storage, multi‑read architecture, high‑availability design, fast recovery, and distributed storage.

The session began with an overview of the overall software architecture: TXSQL at the top, a Space Manager handling space allocation, a DBStore module processing redo logs and page I/O, and the Atlas node – Tencent Cloud’s backend storage platform that powers block storage, databases, and file services.

Client read/write flows were illustrated, focusing on redo processing. A log stream (e.g., IDs 100‑104) is mapped to distributed storage units (T1‑T4). Successful persistence advances the VDL pointer, indicating the durable point for the database.

DB recovery was described: the TXStore client retrieves table‑space mappings, queries LSNs from storage units, sorts them, and determines the recovery point. Optimizations reduce the number of LSNs returned.

The DBStore module’s responsibilities were detailed: it receives redo logs, orders them, checks continuity with the persist queue, persists them, replicates to replicas, and completes the redo cycle. Page reads follow a similar path.

Atlas, the distributed storage platform, supports both internal Tencent services and external customers (e.g., Pinduoduo, Mobike). Its cloud services layer offers block, file, and DB services, while the platform provides capabilities such as scaling, flow control, and snapshots.

Atlas’s two‑layer storage architecture was explained: a client layer with rich interfaces and advanced features (snapshots, volume migration), a control node managing cluster state, fault recovery, and load balancing, and a three‑replica storage node layer delivering high‑performance engines, strong consistency, and F+1 fault tolerance.

Data routing strategies were covered. For block storage, a simple hash maps to a virtual node and then to a storage node. For DB workloads, a more sophisticated MDS‑based allocation maps a DB’s address space directly onto a specific replica to avoid excessive redo fragmentation.

The IO protocol stack of Atlas was outlined: a two‑layer path without a central node, supporting protocols such as iSCSI, file, and DB; network modules for Ethernet and RoCE; a replication module for parallel high‑efficiency copying; an append‑only metadata engine and bare‑metal space manager; a cache layer that can use slower media as main memory and faster media as cache; and a disk‑management module that merges and batches IO submissions.

High‑availability mechanisms were discussed, distinguishing temporary and permanent node failures. Temporary failures trigger incremental data recovery from the primary node, while permanent failures require full synchronization among remaining nodes. Fault detection relies on IO‑path probing rather than simple heartbeats, allowing faster isolation of degraded nodes.

Storage platform features such as snapshots (for fast physical backups), volume migration (online movement across storage tiers), distributed prefetch, multi‑media support (SSD/HDD hybrid, flow control, shared disks), and ongoing feature iteration were highlighted.

A Q&A session addressed cache mechanisms, prefetch logic, data placement strategies (two‑level LRU for SSD/HDD tiering), and scheduling algorithms implemented at the storage layer.

cloud computingHigh Availabilitydistributed storageCynosDBdatabase architecture
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.