Databases 16 min read

Optimizing Write Hotspots in TiDB: Architecture Overview, Problem Analysis, and Solution

This article details a real‑world case of write‑hotspot performance degradation in a TiDB 3.0.9 cluster, explains TiDB’s architecture, analyzes monitoring data and hot‑region tables, and presents a redesign using varchar primary keys and SHARD_ROW_ID_BITS to disperse writes and dramatically improve latency.

HomeTech

Apr 29, 2020

Optimizing Write Hotspots in TiDB: Architecture Overview, Problem Analysis, and Solution

1. Background

The content‑master data project stores its data in a distributed relational database TiDB (version 3.0.9 at the time of writing). This article uses a business‑level optimization case that solved a write‑hotspot problem to briefly introduce TiDB architecture, the cluster’s basic situation, the encountered issues and their solutions, and finally the importance of database development standards.

Content‑master data is a core system of the unified middle‑platform, providing a consistent data source for all automotive content across platforms. Future plans include ingesting data from all content systems, defining unified standards, and offering a centralized, consistent data outlet for front‑end services.

2. TiDB Introduction

TiDB is an open‑source distributed HTAP (Hybrid Transactional and Analytical Processing) database that combines the best features of traditional RDBMS and NoSQL. It is MySQL‑compatible, supports unlimited horizontal scaling, offers strong consistency and high availability, and aims to provide a one‑stop solution for both OLTP and OLAP workloads.

TiDB Server

TiDB Server receives SQL requests, processes SQL logic, locates the required TiKV addresses via PD, interacts with TiKV to fetch data, and returns results. It is stateless, does not store data, and can be horizontally scaled behind load balancers such as LVS, HAProxy, or F5.

PD Server

Placement Driver (PD) manages the cluster metadata, performs scheduling and load balancing for TiKV, and allocates globally unique, incrementing transaction IDs. PD uses the Raft protocol for safety and recommends deploying an odd number of PD nodes.

TiKV Server

TiKV Server provides a distributed transactional key‑value storage engine. Data is divided into Regions (key ranges). Each Region is replicated via Raft groups across multiple TiKV nodes, and PD schedules Regions to balance load.

3. Cluster Information

3.1 Basic Information

The TiDB cluster consists of the following components (all version 3.0.9):

Component

Version

Count

tidb

v3.0.9

tikv

v3.0.9

12 (multiple instances per machine)

pump

v3.0.9

drainer

v3.0.9

Pump records TiDB binlogs in real time, orders them by transaction commit time, and provides them to Drainer. Drainer aggregates binlogs from all Pumps and synchronizes them downstream to Kafka.

3.2 Cluster Topology

Business data is written to TiDB via two CDC pipelines: (1) SQL Server CDC, (2) MySQL Otter. Multiple TiDB nodes behind LVS serve business traffic. Two Drainers stream TiDB data to a downstream Kafka cluster for incremental consumption, while a third Drainer writes data to disk for offline analysis (e.g., using the reparo tool to parse binlogs).

4. Problem Description

The write path is: Drainer → Kafka → consumer → TiDB summary database.

Business teams reported that writes to the TiDB summary database were slow and Kafka messages were piling up. Monitoring showed SQL‑99 response times around 200‑220 ms, while insert QPS was only about 4,500 per second, indicating a clear performance bottleneck.

5. Problem Analysis

5.1 Slow‑Log Inspection

Using pt-query-digest on the slow‑log, the top slow SQL statements were simple INSERTs with an average execution time of 8 ms, suggesting that the latency was not caused by complex queries.

Insert IGNORE Into table_name (ciz_id,data_type,field,`value`) values(37163712,'club','club_is_jinghua','0') \G

5.2 Monitoring Indicators

Hot‑Write panel revealed three anomalies on a single TiKV node:

gRPC poll CPU usage significantly higher than other nodes.

QPS on that TiKV much higher than the rest.

Overall CPU usage on that node markedly higher.

These symptoms indicate a write hotspot concentrating load on one TiKV instance.

5.3 Hot‑Region Table Identification

TiDB 3.0 provides the system view information_schema.TiDB_HOT_REGIONS to locate hot tables or indexes:

select * from information_schema.TiDB_HOT_REGIONS where TYPE='write';

The query revealed that four summary tables of the content data were the hot spots. Their schema (simplified) used an auto‑increment id as the primary key:

CREATE TABLE `table_name` ( 
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `data_type` varchar(100) NOT NULL COMMENT 'data type',
  `ciz_id` bigint(11) NOT NULL COMMENT 'data identifier',
  `content` mediumtext DEFAULT NULL COMMENT 'content',
  `update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'ingest time',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uniq_type_cizid` (`data_type`,`ciz_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='XXX table';

Because TiDB splits data by primary‑key values into Regions, an auto‑increment key causes most recent rows to land in the same Region, creating a write hotspot on a single TiKV node.

6. Solution

To eliminate the hotspot, the table design was changed:

Remove the auto‑increment integer primary key.

Use a varchar primary key.

Configure SHARD_ROW_ID_BITS to scatter rows across Regions.

Revised DDL example:

CREATE TABLE `table_name` ( 
  `global_id` varchar(20) NOT NULL COMMENT 'primary key',
  `data_type` varchar(100) NOT NULL COMMENT 'data type',
  `ciz_id` bigint(11) NOT NULL COMMENT 'data identifier',
  `content` mediumtext DEFAULT NULL COMMENT 'content',
  `update_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'ingest time',
  PRIMARY KEY (`global_id`),
  UNIQUE KEY `uniq_type_cizid` (`data_type`,`ciz_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 SHARD_ROW_ID_BITS=4 COMMENT='XXX table';

After applying the new schema and re‑running performance tests, the write latency dropped dramatically: SQL‑99 response time fell from ~200 ms to around 20 ms.

The improvement was achieved thanks to close collaboration between developers and testers.

7. Summary

Although TiDB automatically rebalances data, business characteristics or sudden load spikes can still create hotspots that become performance bottlenecks. Designing tables for a distributed database differs from traditional single‑node databases; developers must assume that a table’s rows will be spread across many machines.

Based on the experience, the following database development guidelines were established for TiDB 3.0:

Primary keys must be varchar and configured with SHARD_ROW_ID_BITS to avoid write hotspots.

Do not use auto‑increment integer keys. SHARD_ROW_ID_BITS=4 creates 16 (2^4) random ranges for the internal row ID.

From TiDB 4.0 onward, the AUTO_RANDOM syntax can replace AUTO_INCREMENT for integer primary keys, automatically scattering IDs and eliminating hotspots.

CREATE TABLE `tb_example` ( 
  `user_id` varchar(20) NOT NULL COMMENT 'user id',
  `name` varchar(10) NOT NULL DEFAULT '' COMMENT 'name',
  `created_stime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'creation time',
  `modified_stime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT 'modification time',
  PRIMARY KEY (`user_id`),
  KEY `idx_modified_stime` (`modified_stime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 SHARD_ROW_ID_BITS=4 COMMENT='TiDB standard table';

8. Conclusion

This case study of write‑hotspot optimization leads to a set of database development standards that apply not only to TiDB but also to other databases such as SQL Server, MySQL, and MongoDB. Following these standards helps improve database performance, efficiency, and ultimately better serves business needs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed database TiDB Database Optimization Write Hotspot

Written by

HomeTech

HomeTech tech sharing

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.