Databases 11 min read

Uncovering the Flaws: Why No Distributed Database Architecture Is Perfect

This article examines the most common Chinese distributed database architectures, comparing integrated SHARDING designs like OceanBase with separated compute‑storage models such as TiDB and SQL‑proxy based systems, highlighting their inherent trade‑offs, performance challenges, and practical considerations for choosing the right solution.

ITPUB
ITPUB
ITPUB
Uncovering the Flaws: Why No Distributed Database Architecture Is Perfect

Introduction

This is the third post in the “No Perfect” series, following articles on imperfect databases and storage engines. It focuses not on distributed databases themselves but on their architectures, providing a rough analysis of the most common domestic distributed database designs.

SHARDING vs. Global Hash

Distributed data computation mainly takes two forms: partition SHARDING and global consistency HASH. These lead to two broad architectural families: compute‑storage integration (SHARDING) and compute‑storage separation, each with sub‑variants such as peer‑to‑peer, proxy, and external modes.

Integrated SHARDING Architecture – OceanBase

OceanBase exemplifies the integrated SHARDING model. Each OBSERVER is an independent service that combines SQL, transaction, and storage engines and holds a portion of the data shards. The system uses a peer‑to‑peer mode where any observer can serve full database functionality. A global RootService manages the cluster, with a single primary and multiple backups.

Data is replicated with high availability: a write goes to a leader replica and is automatically synchronized to N backup replicas using a Paxos‑based election algorithm. Backups can serve read‑only or weakly consistent reads, improving resource utilization. However, SHARDING databases suffer from read amplification when queries lack the SHARDING KEY, forcing operators to be dispatched to all observers holding the relevant shards.

OceanBase architecture diagram
OceanBase architecture diagram

Separated Compute‑Storage Architecture – TiDB

TiDB separates the compute engine (TiDB) from the storage engine (TiKV). This eliminates the read‑amplification issue of SHARDING databases because the compute layer does not store data locally. However, TiDB introduces a new problem: lack of a global DB cache. Each TiDB node only has a local buffer, and data must travel through the network to TiKV/TiFlash, reducing efficiency. To mitigate this, faster storage media and advanced operator push‑down techniques (e.g., Oracle‑style SMART SCAN) are required.

TiDB architecture diagram
TiDB architecture diagram

SQL‑Proxy Based Separation Architecture

The most widely used domestic architecture combines SHARDING storage with an SQL proxy layer. Examples include Tencent TDSQL, HotDB, ZTE GOLDENDB, Baidu GAIADB, JD STARDB, and openGauss. In TDSQL, core components are a scheduler, a gateway (SQL proxy), and agents, together with MySQL instances as storage nodes. The scheduler coordinates distributed computation, while the gateway parses SQL, generates distributed execution plans, and distributes tasks to storage nodes. Storage nodes run full MySQL engines, simplifying management but making the overall performance heavily dependent on the optimizer quality of the underlying engine.

TDSQL component diagram
TDSQL component diagram

SQL‑On‑Hadoop Architecture

Some databases, such as YiJingJie, adopt an SQL‑on‑Hadoop model. They use an optimized HBase‑compatible storage engine built on HDFS and HBase, offering strong support for large‑scale data and IoT scenarios. However, Hadoop is not optimized for low‑latency, high‑concurrency OLTP or HTAP workloads, making the storage engine a potential bottleneck. Improvements require rewriting the storage engine or employing sophisticated push‑down techniques.

SQL‑On‑Hadoop architecture diagram
SQL‑On‑Hadoop architecture diagram

Conclusion

No distributed database architecture is flawless. Each design—integrated SHARDING, compute‑storage separation, or SQL‑proxy based—has distinct advantages and trade‑offs such as read amplification, cache locality, optimizer dependence, and high‑availability challenges. Selecting a database should consider not only its architecture but also how well the vendor has optimized the product for specific application scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architectureshardingTiDBOceanBaseSQL proxy
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.