TiDB Architecture Explained: TiKV, PD, and Raft in Distributed Databases
TiDB is a distributed, MySQL-compatible database built from three core components—TiDB Server for stateless SQL processing, PD for global scheduling and metadata management, and TiKV for high‑performance key‑value storage—coordinated via the Raft consensus algorithm to ensure strong consistency and fault tolerance.
Preface
After studying TiDB, the author summarizes its architecture.
Overall Framework
TiDB consists of three core components: TiDB Server, PD Server, and TiKV Server, plus TiSpark for complex OLAP needs. A single‑node deployment requires all three components; production clusters are deployed with Ansible.
A complete TiDB cluster diagram is shown below:
TiKV Server
TiKV Server stores data and must provide:
Cross‑data‑center disaster recovery
High write speed
Convenient read speed
Support for data modification and concurrent updates
Atomicity for multi‑record modifications
TiKV uses a key‑value model with ordered traversal; keys are stored in binary order, enabling range scans via successive Next calls.
TiKV’s storage model is independent of SQL tables; it is a high‑performance, highly reliable distributed map.
Data is persisted to disk through RocksDB, a high‑performance single‑node engine maintained by Facebook.
To ensure no data loss and cross‑data‑center disaster recovery, TiDB replicates data across multiple machines using the Raft consensus algorithm, which has been heavily optimized by PingCAP.
Raft provides leader election, membership changes, and log replication.
TiKV writes data via Raft; each change becomes a Raft log entry replicated to a majority of nodes in the Raft group, ensuring safety even if a node fails.
Data storage flow is illustrated below:
TiKV stores data in Regions, each a contiguous key range (default ≤64 MB). Regions are balanced across nodes, and a metadata component tracks which Region resides on which node.
Each Region has multiple Replicas forming a Raft group; one Replica acts as Leader, others as Followers. All reads and writes go through the Leader, which replicates to Followers.
Summary: TiKV is a distributed key‑value storage system, a massive ordered map.
PD Server
Placement Driver (PD) is the global control node of TiDB, responsible for cluster scheduling.
PD collects information such as node status, Raft group metrics, and operation statistics to make scheduling decisions.
Information Collection
PD relies on two heartbeat sources:
1. TiKV node heartbeats
TiKV stores (stores) send periodic heartbeats containing total and available disk capacity, number of Regions, write speed, snapshot counts, overload status, and label information.
2. Raft group leader heartbeats
Leaders report leader and follower locations, number of offline Replicas, and read/write speeds.
PD uses this data to schedule replicas, balance load, and handle node failures.
Scheduling Strategies
PD applies several policies:
Ensure each Region has the correct number of Replicas.
Distribute Replicas of a Raft group across different locations.
Balance Replica count across Stores.
Evenly distribute Leaders among Stores.
Spread hot keys evenly.
Keep Store storage usage roughly equal.
Control scheduling speed to avoid impacting online services.
Support manual node decommissioning.
TiDB Server
TiDB Server receives SQL requests, parses MySQL protocol packets, performs syntax analysis, query planning, optimization, and executes the plan by fetching data from TiKV. It is stateless and can be horizontally scaled behind a load balancer.
TiSpark
TiSpark provides Spark SQL on TiKV, enabling HTAP (Hybrid Transactional/Analytical Processing) by running Spark directly on TiDB’s storage layer. It requires a Spark cluster and the presence of TiKV and PD.
Conclusion
TiKV handles storage, PD handles scheduling, TiDB Server handles computation, and the Raft protocol ensures strong consistency and data safety across the distributed TiDB database.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
