Databases 19 min read

TiDB Operational Practices and Performance Benchmarking at Beijing Shunfeng Tongcheng Technology

This article presents a comprehensive case study of TiDB deployment at Beijing Shunfeng Tongcheng Technology, covering application scenarios, TiDB features, detailed performance benchmarks, operational challenges, optimization techniques, ecosystem tools, and best‑practice recommendations for large‑scale distributed database management.

Beijing SF i-TECH City Technology Team

Jun 18, 2024

TiDB Operational Practices and Performance Benchmarking at Beijing Shunfeng Tongcheng Technology

1. Application Scenario Introduction

Currently TiDB is used at Beijing Shunfeng Tongcheng Technology (Beike) for the SDS system, which relies on massive real‑time data synchronized from the group’s Kafka. The system requires large storage capacity, flexible scalability, high efficiency, high stability and high availability.

With rapid business growth, a 12‑node TiDB cluster now stores about 243 million incremental rows per day. This article shares operational practices and challenges of TiDB at Beike.

2. Why Choose TiDB

2.1 TiDB Features

TiDB combines the best of traditional RDBMS and NoSQL, is MySQL‑compatible, supports unlimited horizontal scaling, and provides strong consistency and high availability.

Key characteristics:

High MySQL compatibility – most applications can migrate without code changes.

Horizontal elastic scaling – add nodes to increase throughput or storage.

Distributed ACID transactions.

Financial‑grade high availability using Raft‑based majority election.

2.2 Surprising Benefits

In TiDB you no longer worry about primary‑node capacity, you get native online DDL, column additions/modifications complete in seconds without table rebuild, no master‑slave lag, and extensive monitoring metrics plus ecosystem automation tools.

2.3 Performance Benchmark

Hardware Configuration

Service Type

Instance Type

Instance Count

BMI5 (96 cores / 384 GB / 7 TB NVMe SSD)

TiKV

BMI5 (96 cores / 384 GB / 7 TB NVMe SSD)

TiDB

BMI5 (96 cores / 384 GB / 7 TB NVMe SSD)

Sysbench

BMI5 (96 cores / 384 GB / 7 TB NVMe SSD)

Software Versions

Service Type

Software Version

3.0.18

TiKV

3.0.18

TiDB

3.0.18

Sysbench

3.0.18

Write Test

Threads

QPS

95% Latency (ms)

7705

2.81

13338

3.82

21641

5.18

128

33155

7.84

256

44574

12.08

512

58604

17.32

768

67901

22.28

1024

75028

26.68

1536

86010

34.33

2048

92380

44.98

2500

96671

54.80

OLTP Read/Write Test

Threads

QPS

95% Latency (ms)

18000

35600

23.1

60648

26.68

128

92318

33.12

256

113686

55.82

512

138616

94.1

768

164364

134.9

1024

190981

167.44

1536

223237

204.11

2048

262098

231.53

2500

276107

272.27

Read‑Only Test

Threads

QPS

95% Latency (ms)

24235.51

15.27

45483.64

16.71

80193.6

17.95

128

123851.61

20.37

256

144999.89

34.30

512

174424.94

58.92

768

183365.72

1024

200460.98

108.68

1536

236120.82

153.02

2048

264444.73

204.11

2500

285103.48

253.35

3. Problems Encountered on TiDB and Solutions

3.1 High average latency and Raftstore thread CPU saturation

In massive‑data, limited‑resource scenarios a single TiKV holds many Regions, causing heavy Raftstore thread overhead. Version 2.x fixed the thread count at 2, creating a bottleneck.

Upgrading from 2.1 to 3.0 GA brought:

Stability: support for >150 storage nodes and >300 TB stable storage.

Ease of use: standardized slow‑query logs, EXPLAIN ANALYZE, SQL Trace, etc.

Performance: TPC‑C ~4.5×, Sysbench ~1.5× improvement; View support enables TPC‑H 50 GB Q15.

New features: window functions, experimental views, partitioned tables, plugin system, pessimistic lock (experimental), SQL Plan Management.

Average 999‑percentile latency dropped >5× to 400‑500 ms.

3.2 Execution‑plan anomalies causing high load

Incorrect statistics lead to wrong index choices, causing full‑table scans on tables with billions of rows. The solution is to keep statistics accurate by increasing the Analyze frequency.

Automatic Statistics Update

TiDB automatically updates total row count and modified rows; the update interval is controlled by stats‑lease (default 3 s). Setting it to 0 disables auto‑update.

System Variable

Default

Function tidb_auto_analyze_ratio 0.5

Auto‑update threshold

tidb_auto_analyze_start_time

00:00 +0000

Start time of daily auto‑analyze window

tidb_auto_analyze_end_time

23:59 +0000

End time of daily auto‑analyze window

When a table’s modify_count exceeds tidb_auto_analyze_ratio of its total rows and the current time falls within the configured window, TiDB runs ANALYZE TABLE tbl automatically.

3.3 Write‑write and read‑write conflicts increasing latency

TiDB uses an optimistic lock model with a two‑phase commit (Prewrite → Commit). High concurrency on the same order record leads to many txnLock (write‑write) and txnLockFast (read‑write) conflicts, raising response time.

Mitigation strategies:

TiDB lock‑conflict pre‑check – early detection of conflicts before sending requests to TiKV.

Attempt pessimistic transaction lock (performance not better).

Serializing operations on the same order record:

Redis distributed lock (adds complexity).

Asynchronous write via message queue – route same key to a single Kafka partition to avoid concurrent writes.

After several analysis‑optimization cycles, conflict counts were reduced and overall average latency decreased, improving cluster stability.

4. Ecosystem

4.1 DbKiller

Self‑developed tool that kills long‑running queries based on configurable policies, useful for handling abnormal queries and preventing database snowball failures.

4.2 DbCleaner

Self‑developed tool for window‑based archiving of tables, compatible with MySQL and checks replica lag.

4.3 Data Migration (DM)

Integrated data synchronization platform supporting full and incremental migration from MySQL or MariaDB to TiDB, simplifying error handling and reducing operational cost.

4.4 TiDB Lightning

Tool for fast bulk import of new data or full‑backup restoration, supporting multiple migration and upgrade scenarios.

5. Optimization Practices

5.1 Hotspot Issues

TiDB splits data by Region (default 96 MB). AUTO_INCREMENT primary keys cause writes to concentrate on a single Region, creating hotspots. Solutions include using the implicit _tidb_rowid with SHARD_ROW_ID_BITS and PRE_SPLIT_REGIONS to randomize row IDs and pre‑split regions.

5.2 Archiving Historical Data

Introduce partitioned tables with order‑time keys; drop partitions to delete old data efficiently, avoiding performance impact of massive DELETE operations.

5.3 Data Backup

Backup & Restore (BR) provides efficient full and incremental backups for a ~14 TB cluster, reducing backup time from days to a manageable window.

5.4 Cluster State Diagnosis

TiDB Dashboard (v4.0) offers comprehensive metrics and diagnostic reports, enabling quick identification of abnormal states and performance bottlenecks.

6. Conclusion

TiDB, as a next‑generation high‑performance distributed database, is a strong choice for massive data storage scenarios. Ongoing community engagement and internal operational improvements will continue to guide its adoption in suitable use cases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

scalability Performance Benchmark distributed database TiDB

Written by

Beijing SF i-TECH City Technology Team

Official tech channel of Beijing SF i-TECH City. A publishing platform for technology innovation, practical implementation, and frontier tech exploration.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.