Databases 31 min read

TiDB Technical Deep Dive – Storage, Compute, and Scheduling Architecture

This article provides a comprehensive technical overview of TiDB, covering its HTAP design, TiKV storage engine with RocksDB and Raft replication, the mapping of relational tables to key‑value pairs, MVCC implementation, transaction handling, and the PD scheduler that balances replicas, leaders, and hot spots across a distributed cluster.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
TiDB Technical Deep Dive – Storage, Compute, and Scheduling Architecture

TiDB Technical Deep Dive – Storage Chapter

TiDB is an open‑source distributed HTAP database designed by PingCAP, compatible with MySQL, offering horizontal scalability, strong consistency, and high availability for both OLTP and OLAP workloads.

Highly compatible with MySQL, enabling migration without code changes.

Horizontal elastic scaling by adding nodes.

100% ACID‑compliant distributed transactions.

Financial‑grade high availability using Raft consensus.

HTAP solution with TiSpark for complex analytics.

Cloud‑native design for public, private, and hybrid clouds.

TiKV Storage Engine

TiKV implements a key‑value model stored in RocksDB, providing an ordered map where keys are raw byte arrays.

Key1 -> Value<br/>Key2 -> Value<br/>...<br/>KeyN -> Value

With MVCC, each version is encoded as a suffix:

Key1-Version3 -> Value<br/>Key1-Version2 -> Value<br/>Key1-Version1 -> Value<br/>...<br/>KeyN-Version1 -> Value

Region Partitioning

Data is split into contiguous key ranges called Regions (default 64 MiB). Regions are distributed across nodes and each Region forms a Raft group for replication.

Raft Replication

Raft provides leader election, membership changes, and log replication, ensuring that each Region’s data is safely replicated to a majority of nodes.

Compute Chapter

The SQL layer maps relational tables to TiKV key‑value pairs. Each table receives a unique TableID, each index an IndexID, and each row a RowID.

Row encoding example:

Key: tablePrefix{tableID}_recordPrefixSep{rowID}<br/>Value: [col1, col2, col3, col4]

Index encoding example (unique index):

Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue<br/>Value: rowID

Non‑unique index adds the RowID to guarantee uniqueness:

Key: tablePrefix{tableID}_indexPrefixSep{indexID}_indexedColumnsValue_rowID<br/>Value: null

These encodings are memcomparable, preserving ordering after binary encoding, which enables efficient point lookups and range scans.

SQL Execution

SQL statements are translated into KV operations. For example, SELECT COUNT(*) FROM user WHERE name='TiDB' is executed by constructing a key range for the table, scanning rows, filtering on the name column, and aggregating the count, with push‑down of filters and aggregations to TiKV when possible.

Scheduling Chapter

PD (Placement Driver) acts as the central scheduler, collecting node status and Region leader heartbeats to make placement decisions.

Ensures each Region has the correct number of replicas.

Distributes replicas across distinct locations (nodes, racks, data centers) using label‑based placement.

Balances replica and leader distribution for even load.

Detects hot spots and re‑balances them.

Controls migration speed to avoid impacting online services.

Supports manual node decommissioning.

PD issues three basic operations to TiKV: AddReplica, RemoveReplica, and TransferLeader, which are executed by Raft groups based on the scheduler’s plan.

Overall, the article explains how TiDB combines a MySQL‑compatible front‑end with a distributed KV store, leverages Raft for fault‑tolerant replication, and uses PD to continuously balance resources and maintain high availability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Schedulingdistributed databaseTiDBHTAPKV storageRaftSQL Layer
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.