Databases 17 min read

How GaiaDB Redefines Cloud‑Native Databases with Fusion Architecture

GaiaDB, Baidu’s cloud‑native database, combines compute‑storage separation with a fused, log‑service architecture to boost performance, simplify consistency, and deliver multi‑level high availability across zones and regions, while supporting new features such as parallel query, HTAP replicas, and serverless scaling.

Baidu Intelligent Cloud Tech Hub

Nov 27, 2023

How GaiaDB Redefines Cloud‑Native Databases with Fusion Architecture

1 Cloud‑Native Databases and GaiaDB

Cloud‑native databases are now deployed at massive scale, aiming for a seamless single‑node + distributed integration. Two main evolution paths exist: the compute‑storage separation route favored by major public clouds (AWS Aurora, Alibaba PolarDB, Tencent TDSQL‑C, Baidu GaiaDB) and the distributed‑framework route exemplified by OceanBase and TiDB.

The separation route modifies traditional databases on a compute‑storage‑separated architecture, ensuring full compatibility with existing syntax, usage habits, and ecosystem, while providing low‑latency transaction processing and enhanced read and storage scalability.

The distributed‑framework route builds a distributed framework first and then embeds database logic, splitting transaction and lock subsystems into independent modules. This improves write‑heavy workloads but introduces higher transaction latency due to cross‑network interactions.

Both routes are converging: the separation approach is adding multi‑write capabilities, while the distributed‑protocol approach explores single‑node deployments for small data scales.

GaiaDB Evolution

GaiaDB originates from Baidu Intelligent Cloud’s years of database R&D. Since its first release in 2020, it introduced large‑capacity storage and elastic capabilities via compute‑storage separation. In 2021, regional hot‑active functionality was added, achieving near‑primary latency across regions without the drawbacks of logical sync.

Version 2.0 integrated core products such as Hand‑book, Tieba, and Wenku, addressing cross‑region latency and performance challenges. Version 3.0 added cross‑AZ hot‑active support, enabling each availability zone to serve traffic without extra storage cost. Version 4.0 further enhanced performance and feature completeness.

High‑Performance & Multi‑Level High‑Availability Design

GaiaDB’s performance core lies in fusing database and distributed storage, converting the full data path to database semantics. Traditional designs require separate consensus, WAL, and snapshot mechanisms, leading to multi‑layer latency and write amplification.

GaiaDB merges the master‑slave sync logic, log logic, snapshot, and persistence into a unified pipeline. The distributed protocol’s master‑slave sync is integrated into the compute node, reducing a two‑hop network to a single hop and cutting latency.

All incremental logs are unified as physical redo logs managed by a LogService, simplifying reliability handling. Persistence, snapshot, and replay are also merged into storage nodes, enabling page‑level MVCC and a streamlined data flow.

Consensus improvements replace the two‑hop Raft flow with a compute‑node‑direct‑to‑multiple‑LogService approach, eliminating special hardware requirements and reducing latency. The unified log service also enables asynchronous write‑back while preserving transaction consistency.

I/O thread isolation separates I/O handling from transaction processing, maintaining low I/O latency under heavy load. MVCC‑enabled storage nodes provide strong consistency even with asynchronous replay, and LogService’s version‑aware writes allow batch flushing, cutting disk I/O by about 50%.

Overall, these optimizations yield up to 40% throughput increase and significant tail‑latency reduction.

High‑Availability Architecture

GaiaDB adopts a three‑layer design: stateless compute layer (transaction handling), log layer (incremental log persistence with majority‑based HA), and storage layer (data page persistence). The compute layer’s statelessness enables rapid scaling and sub‑second failover.

The log layer provides majority‑based HA without a leader election bottleneck, as consensus responsibilities are shifted upward. The storage layer tolerates n‑1 replica failures; as long as one replica survives, the incremental logs allow full data reconstruction.

Cross‑AZ high availability leverages symmetric deployment, allowing each zone to host a hot‑active instance without extra storage cost. Data flows use the shortest one‑hop network, and a chain‑style self‑check mechanism ensures data integrity across complex networks.

Cross‑region HA employs asynchronous parallel writes, achieving tens of milliseconds of sync latency nationwide while maintaining near‑primary throughput. Automatic fast‑switch and locality‑aware reads further improve user experience.

Upcoming Features (Gray‑Release)

Parallel Query – accelerates multi‑row queries on massive datasets by fully utilizing CPU, memory, and distributed I/O parallelism.

HTAP Analytic Replica – provides columnar indexing and supports hundred‑TB analytics while remaining compatible with transactional workloads.

Serverless – dynamically reallocates resources between transactional peaks and offline workloads, reducing operational costs and improving utilization.

These features are slated for near‑term gray‑scale trials.

Performance cloud-native distributed-systems high-availability