Why Cloud‑Native Databases Are Redefining Elasticity and Resilience
Cloud‑native databases address the elasticity, resilience, and high‑availability demands of modern cloud computing by separating compute and storage, leveraging log‑based persistence, multi‑replica consensus, and distributed architectures such as Spanner, Aurora, and TiDB, offering higher performance, lower cost, and better resource utilization.
Background
With the rapid growth of cloud computing, IT applications are moving to the cloud, and cloud services exhibit several characteristics:
Provide on‑demand services.
Users prefer paying operational costs rather than asset costs.
Service provider clusters are increasingly large, often reaching cloud‑scale.
These traits require cloud products to be elastic and to possess self‑healing (resilience) capabilities.
Challenges of Traditional RDS
Initially, databases were simply lifted to IaaS as relational database services (RDS). While this offers some elasticity and resilience, it suffers from low resource utilization, high maintenance cost, and limited availability, making cloud‑native databases essential.
MySQL Replication Example
High‑availability or read/write‑split clusters for MySQL require a binlog replication setup.
The diagram shows write‑ahead logging, redo log, binlog, and relay log writes.
Introduction to Cloud‑Native Databases
To solve the above problems, new generation cloud databases are designed with characteristics such as decoupling, minimal state, and lightweight node expansion.
1. Spanner‑Based Solutions
Google Spanner pioneered cloud‑native databases, inspiring CockroachDB, TiDB, YugabyteDB, etc.
1.1 Architecture
Using TiDB as an example:
These products wrap a distributed SQL execution engine over a key‑value store, employing 2PC or its variants for transaction processing. The compute nodes act as stateless SQL engines, forming a fully distributed database.
1.2 Storage High Availability
Spanner splits tables into tablets and uses multi‑replica Paxos; TiDB uses multi‑replica Multi‑Raft per region; CockroachDB uses Raft per range.
1.3 Pros and Cons
Limited SQL support (e.g., YugabyteDB does not support JOIN).
2. Aurora‑Based Solutions
Aurora, from Amazon, separates compute and storage for MySQL/PostgreSQL, but remains a monolithic read/write‑split cluster.
It adopts Spanner’s log‑persistence idea, treating logs as the database and pushing them to storage.
2.1 Architecture
The green part shows log flow.
Aurora writes only logs from the primary instance; storage applies logs for persistence, eliminating page‑flush and checkpoint overhead.
2.2 High Availability
It uses a quorum voting protocol across three availability zones, allowing continued operation even if one zone fails.
3. CynosDB
CynosDB largely mirrors Aurora’s design, with its own features such as Raft‑based multi‑replica storage and log‑driven buffer cache synchronization.
4. PolarDB
PolarDB also separates compute and storage but keeps redo log handling on the compute side, using existing distributed file systems.
It focuses on storage‑layer optimizations (PolarFS) and query acceleration (FPGA).
5. Socrates
Socrates, Microsoft’s DaaS architecture, reuses SQL Server components, separates log and page storage, and introduces XLogService for log handling.
Leverages SQL Server’s page version store for snapshot isolation.
Uses SSD‑based resilient cache for fast crash recovery.
Implements RBIO protocol for remote page reads.
6. TaurasDB
TaurasDB inherits Aurora’s log‑sink storage and Socrates’ log‑page separation, adding a storage abstraction layer (SAL) and using quorum algorithms for high availability.
Core Functions of Cloud‑Native Databases
Compute‑storage separation with stateless or minimal‑state compute nodes.
Log‑based persistence.
Storage sharding for easy scaling.
Multi‑replica storage with consensus algorithms.
Backup, restore, and snapshot capabilities delegated to the storage layer.
Non‑Core Features of Popular Solutions
Global deployment considerations include multi‑region availability, distributed transactions, and GDPR compliance.
Core Value of Cloud‑Native Databases
Higher performance due to lightweight log‑based replication.
Better elasticity with stateless compute nodes.
Improved availability via fine‑grained replication and consensus.
Higher resource utilization through on‑demand scaling.
Reduced cost from lower resource waste and maintenance overhead.
References
[1] "Amazon Aurora: Design Considerations for High Throughput Cloud‑Native Relational Databases"
[2] "Spanner: Google’s Globally‑Distributed Database"
[3] "TiDB: A Raft‑based HTAP Database"
[4] PolarDB redo replication
[5] PolarDB Architecture
[6] GDPR
[7] "Socrates: The New SQL Server in the Cloud"
[8] "Taurus Database: How to be Fast, Available, and Frugal in the Cloud"
[9] 腾讯云新一代自研数据库 CynosDB 技术详解——架构设计
Qingyun Technology Community
Official account of the Qingyun Technology Community, focusing on tech innovation, supporting developers, and sharing knowledge. Born to Learn and Share!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
