Databases 13 min read

TiDB Architecture and Performance Optimization for Yiche’s 818 Car Carnival Data Dashboard

This article presents a technical case study of Yiche’s 818 Car Carnival data dashboard, detailing the background, business requirements, TiDB selection and architecture, encountered issues with TiDB, TiCDC and query performance, and the solutions and performance results achieved.

Yiche Technology

Nov 17, 2021

TiDB Architecture and Performance Optimization for Yiche’s 818 Car Carnival Data Dashboard

1. Background Yiche (established in 2000, listed on NYSE in 2010, privatized in November 2020 and became part of the Tencent family) is a Chinese automotive internet platform that provides professional automotive news and marketing solutions. The "Super 818 Car Carnival Night" was a joint event launched by Zhejiang TV and the Yiche App, combining automotive speed, technology, fashion, and cross‑screen interactive experiences.

2. Business Scenario The 818 carnival required a data dashboard powered by TiDB to display real‑time metrics such as topics, activities, traffic, leads, and user interactions. The most active components were the "Shake‑to‑Win" (red packets, half‑price cars, Yiche coins) and live‑stream voting. The system needed massive data storage, high concurrency, low‑latency reads/writes, and to serve as the source for real‑time computation with Flink.

3. Database Selection Requirements: (1) massive data storage, (2) high‑concurrency, low‑latency reads/writes, (3) stable and easily scalable service. Initially MySQL was used, but during stress tests the master‑slave replication lag grew, causing temporary unavailability and disk exhaustion. TiDB was evaluated as a backup and later promoted to the primary solution after multiple stress‑test rounds demonstrated that TiDB v4.0.14 met all scenarios.

For the "Shake‑to‑Win" business, MySQL remained the primary database, while TiDB served as a disaster‑recovery solution synchronized via DM; when MySQL became unavailable, the service switched to TiDB.

4. TiDB Architecture The real‑time data sources include MySQL, TiDB, SQL Server, and traffic logs. Two TiDB clusters were deployed (Cluster 1 and Cluster 2) for active‑standby redundancy. Each cluster consists of three TiDB servers behind Consul load balancers, three PD servers, two TiKV nodes (single‑machine dual‑node deployment), and two TiFlash nodes. Four additional machines are reserved for scaling. The TiDB version used is v4.0.14.

5. Issues and Solutions

5.1 TiDB v4.0.12 bug During early testing the cluster ran v4.0.12 and produced repeated warnings like

[ERROR] [update.go:796] "[stats] auto analyze failed" ... "error=\"analyze worker panic\""

. The issue was identified as bug #20874, triggered when new_collations_enabled_on_first_bootstrap: true was set. Upgrading to v4.0.14 resolved the problem.

5.2 Execution plan changes A slow SQL query during the event initially used an IndexFullScan on winning_create_time. As data volume grew, the optimizer switched to a TableScan, causing high latency. Monitoring revealed read hotspots on TiKV coprocessor nodes, with doubled key scans after 21:00. The solution involved load‑based region splitting and adding a composite index.

5.3 TiCDC problems

Kafka address missing multiple brokers caused dial tcp ... i/o timeout; adding all three broker IPs fixed it.

Kafka required Canal‑JSON format; set protocol=canal-json and enable-old-value=true in the config.

Message size exceeded Kafka limit; increased max-message-bytes=1048576 (1 MiB).

Data was being sent to a single partition; using rowid as a hash for parallel dispatch resolved the issue.

Key TiCDC configuration (test.conf) and sync command:

# test.conf (TiCDC configuration)
enable-old-value=true
[filter]
rules = ['bigdata_test.*']
[sink]
dispatchers = [
    {matcher = ['bigdata_test.*'], dispatcher = "rowid"},
]

tiup ctl:v4.0.14 cdc changefeed create --pd=http://10.20.20.20:2379 --sink-uri="kafka://10.10.10.1:9092,10.10.10.2:9092,10.10.10.3:9092/bigdatatest?kafka-version=2.6.1&partition-num=3&max-message-bytes=1048576&replication-factor=1&protocol=canal-json" --changefeed-id="bigdata_test" --config=/home/tidb/cdc_conf/test.conf

6. Performance Results

Dashboard SQL 99th percentile latency ~3 ms, 999th percentile <8 ms, QPS ≈ 62 k.

Shake‑to‑Win disaster‑recovery: TiCDC showed negligible delay for multi‑table sync during the event.

7. Future Plans and Recommendations

Replace AUTO_INCREMENT primary keys with AUTO_RANDOM to disperse write hotspots.

Prefer clustered primary indexes over non‑clustered ones for better performance.

Upgrade JDBC connectors and set userConfigs=maxPerformance to avoid transaction‑status overhead.

Suggest further TiCDC improvements for single‑table downstream latency.

8. Acknowledgements Thanks to PingCAP engineers (Zhang Zhenjiao, Wang Xiaoyang, Su Dan, Dong Mei, etc.) for on‑site support, troubleshooting, and valuable optimization advice during the 818 event.

Author Bio Peng Zhaojing joined Yiche in March 2021, responsible for TiDB and MySQL operations and tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink Database Architecture TiDB TiCDC

Written by

Yiche Technology

Official account of Yiche Technology, regularly sharing the team's technical practices and insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.