Databases 13 min read

TiDB Overview: Architecture, Deployment, Advantages, and Practical Tips

This article introduces TiDB, an open‑source distributed HTAP database compatible with MySQL, explains its architecture—including TiDB Server, PD, TiKV, and TiSpark—covers deployment requirements, highlights key advantages such as elastic scaling, fault‑tolerance, strong consistency, and provides practical tips and common pitfalls for production use.

360 Quality & Efficiency

Jun 19, 2020

TiDB Overview: Architecture, Deployment, Advantages, and Practical Tips

Background : When a MySQL table reaches billions of rows, performance degrades and schema changes become time‑consuming; the common solution is sharding, which requires application‑level changes. TiDB is a distributed NewSQL database compatible with most MySQL protocols, allowing migration without code modifications while delivering high performance.

TiDB Introduction : TiDB is an open‑source distributed HTAP (Hybrid Transactional and Analytical Processing) database that combines the best features of traditional RDBMS and NoSQL. It is MySQL‑compatible, supports unlimited horizontal scaling, provides strong consistency and high availability, and serves both OLTP and OLAP workloads.

Overall Architecture : A TiDB cluster consists of three core components—TiDB Server (stateless SQL processing), Placement Driver (PD) for metadata, scheduling and ID allocation, and TiKV Server (distributed transactional key‑value store). An optional TiSpark component enables Spark SQL to run directly on TiKV for complex OLAP queries.

Component Details :

TiDB Server receives SQL requests, parses and plans them, and fetches data from TiKV via PD. It can be horizontally scaled behind load balancers such as LVS or HAProxy.

PD stores cluster metadata, performs region scheduling, load balancing, and generates globally unique, monotonically increasing transaction IDs. It uses the Raft consensus algorithm and should be deployed with an odd number of nodes.

TiKV Server stores data as Regions (key ranges) and replicates them using Raft groups to ensure consistency and fault tolerance.

TiSpark integrates Spark SQL with TiKV, allowing a single system to handle both OLTP and OLAP without data duplication.

Deployment Environment : Recommended Linux distribution is CentOS 7.3+ (other mainstream Linux versions are also supported). Use SSDs (PCIe preferred) for storage. A production cluster typically requires at least five machines: 2 TiDB nodes, 3 TiKV nodes, with PD optionally co‑located on TiDB servers.

TiDB Advantages :

Unlimited horizontal elasticity – add nodes to scale capacity without application changes.

Automatic fault recovery and multi‑region active‑active deployment via Raft‑based replication.

Strong consistency with distributed ACID transactions across nodes.

High compatibility with MySQL tools, enabling zero‑cost migration.

Performance benefits for large tables: fast index creation, rapid schema changes, and superior query speed on massive datasets.

Practical Tips and Common Pitfalls :

Avoid auto‑increment primary keys; use meaningful unique keys or distributed ID generators to prevent hotspot regions.

Minimize schema alterations—TiDB restricts simultaneous column or index modifications, and dropping primary keys or indexes is not supported.

Upgrade to TiDB 3.1 to resolve known bugs in version 3.0 (e.g., silent duplicate primary key inserts and unlimited index length).

Avoid reverse‑selection queries (NOT IN, !=, LIKE with leading wildcard) as they trigger full table scans.

Mitigate hotspot writes by pre‑splitting regions. For integer primary keys, use:

SPLIT TABLE file_HOTSPOT BETWEEN (0) AND (9223372036854775807) REGIONS 128;

For non‑integer keys, configure shard_row_id_bits and pre_split_regions:

create table t (a int, b int, index idx1(a)) shard_row_id_bits = 4 pre_split_regions=3;

Set shard_row_id_bits=4 to distribute row IDs into 16 ranges and pre_split_regions=3 to create 8 initial regions.

Other considerations include single‑record size limits (KV entry ≤6 MB, total KV entries ≤300 k, total KV size ≤100 MB) and transaction size limits (≤5 000 statements).

Data Resilience : Deploy at least three TiDB replicas for high availability. Backups are performed with TiDB BR, which requires a fresh cluster for restoration.

Monitoring : TiDB can be monitored via Grafana using the MySQL protocol data source, providing visual dashboards for cluster metrics.

Conclusion : TiDB offers a compelling alternative to MySQL sharding with elastic scaling, strong consistency, zero‑cost migration, and robust fault tolerance, though it demands a relatively complex production environment and careful attention to deployment and operational best practices.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization scalability Deployment distributed database TiDB HTAP MySQL Compatibility

Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.