Why Distributed Database Architectures Matter: From Shared‑Nothing to Shared‑Storage
This article introduces the fundamentals of distributed database architectures, compares Shared‑Nothing and Shared‑Storage designs, explains their three‑tier structure, core engine components, SQL execution flow, performance and cost optimizations, and showcases a real‑world high‑traffic deployment in Douyin’s Spring Festival event.
Distributed Database Architecture Overview
Modern relational databases such as MySQL and PostgreSQL dominate the market, so any new database product must be compatible with their ecosystems. However, traditional monolithic architectures cannot meet the scalability and performance demands of large‑scale applications, prompting the move to distributed databases.
Problems with Traditional Three‑Layer Architecture
The middleware layer imposes usage restrictions, requiring users to specify sharding keys, which can degrade performance.
Local disk capacity limits single‑node storage, even with multiple disks.
Cross‑datacenter deployments create trade‑offs between RPO and performance.
These issues motivate the exploration of alternative architectures.
Two Main Distributed Database Architectures
Shared‑Nothing : Often called MPP databases, optimized for high throughput analytical workloads, typically using columnar storage.
Shared‑Storage : Targets low‑latency online transaction processing, usually employing row‑based storage.
Both architectures have distinct strengths and should be chosen based on business needs.
Shared‑Storage Architecture Details
The system is divided into three layers:
Proxy layer
Compute layer
Distributed storage layer
Nodes in each layer interconnect via high‑speed networks, while the storage layer provides a shared storage pool using various media.
Key characteristics include strong flexibility, full MySQL and PostgreSQL compatibility, high availability through multi‑replica deployment, superior performance in cluster mode, low cost due to independent scaling of compute and storage, and support for TB‑to‑PB scale data.
Compute Engine Components
The compute engine consists of:
Access layer
Query Engine
Buffer Pool
Log subsystem
Transaction subsystem
Lock subsystem
These modules together provide ACID guarantees.
SQL Lifecycle
When a SQL statement arrives, it is tokenized, parsed into an abstract syntax tree (AST), transformed into a logical plan, optimized, converted to a physical plan, and finally executed by the volcano‑style execution engine, which interacts with storage to fetch data and return results.
Engine Kernel Optimizations
Deep redesign of the log subsystem with an append‑only model and enriched redo‑log semantics.
Implementation of Extent Data Cache to retain hot data after crashes.
DDL optimizations leveraging distributed storage capabilities.
Operator push‑down to storage nodes for improved query performance.
Lock‑free ReadView to reduce global lock contention.
Fusion of redo log and binlog into a single stream for higher write throughput.
Distributed Storage System
Pages from the compute layer are mapped to storage segments using hash or round‑robin schemes, replicated across multiple nodes for high availability. The storage layer focuses on two core data types: redo logs and pages.
Cost‑control techniques include tiered storage (persistent memory, SSD, HDD), efficient compression algorithms, and smart replica strategies such as erasure coding and lazy replication.
Real‑World Deployment: Douyin Spring Festival Event
The distributed database supported the Douyin Spring Festival activity, handling peak read QPS over 6 million, write QPS over 3.6 million, and more than 20 TB of data. Optimizations involved query‑pattern analysis, transaction splitting, performance benchmarking, extensive stress testing, and fault‑injection drills to ensure reliability.
Future Outlook
Adoption of new hardware like RDMA, persistent memory, and compute‑storage devices.
Development of multi‑model compute engines supporting relational, document, and graph workloads.
Continued deep kernel optimizations, including distributed B‑tree structures, optimistic transaction support, advanced lock scheduling, and AI‑driven database tuning.
Q&A
Q: Where is data classified as hot, warm, or cold? A: Classification occurs in the storage layer based on access frequency and I/O pressure.
Q: How does the system support both MySQL and PostgreSQL? A: By abstracting log‑to‑page conversion, the storage layer can accommodate any database that uses redo logs and pages.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
