Databases 16 min read

TiDB Operational Practices at Ctrip: Architecture, Use Cases, Performance Tuning, Monitoring, and Tooling

This article details Ctrip's migration from MySQL to TiDB, describing the multi‑data‑center architecture, real‑world use cases such as the international CDP platform and hotel settlement, performance tuning measures, comprehensive monitoring and alerting, auxiliary tools, and future roadmap for the distributed NewSQL database.

Ctrip Technology
Ctrip Technology
Ctrip Technology
TiDB Operational Practices at Ctrip: Architecture, Use Cases, Performance Tuning, Monitoring, and Tooling

Introduction: Around 2014 Ctrip began using MySQL at scale and encountered bottlenecks such as oversized tables and storage limits, prompting an evaluation of distributed databases and the eventual selection of the open‑source NewSQL TiDB.

TiDB Overview: TiDB supports HTAP workloads, is largely MySQL‑compatible, provides horizontal scalability, strong consistency, and high availability; development began with a POC in November 2018 and the first production deployment in January 2019.

Architecture: A three‑replica deployment spans three data centers, ensuring IDC‑level high availability and disaster recovery with automatic health‑checking and failover (RPO = 0, RTO < 30 s). The PD component supplies a global timestamp service, and a sample PD/TiKV label configuration is illustrated.

Use Cases: International CDP platform – TiDB stores persistent tags and serves both OLTP (UID, order queries) and OLAP (marketing analytics) workloads; Hotel settlement – a 6 TB database migrated to TiDB with TiFlash enabled for column‑store acceleration. TiFlash MPP mode dramatically reduces query latency (e.g., 20 s → 1 s).

Performance Tuning: An observed write latency spike was traced to insufficient scheduler‑worker resources. Adjustments such as scheduler-worker-pool-size: 16 → 40 and scheduler-pending-write-threshold: 100MB → 1024MB restored normal latency.

Issues and Practices: Distributed auto‑increment columns can cause duplicate‑key errors; TiDB allocates ID batches per server, leading to conflicts when explicit IDs are inserted. A known bug caused default‑value handling anomalies after column type changes; the issue was fixed in TiDB 4.0.9+. Community forums are emphasized for rapid problem resolution.

Monitoring & Alerts: TiDB metrics are collected via Prometheus and visualized with Grafana, then integrated into Ctrip's unified monitoring platform. Dashboards cover cluster health, three‑replica status, disk usage (alert at 80 % capacity), configuration compliance, and performance thresholds.

Tooling: Existing MySQL‑compatible tools (DDL publishing, online query/modification) are extended to TiDB. A custom deployment tool handles cluster lifecycle (provisioning, scaling, upgrades). A flashback utility leverages TiDB binlog to generate rollback SQL for accidental data changes.

Future Plans: Automating fault analysis with full‑link SQL tracing, evaluating HDD‑based deployments for low‑cost scenarios, and advancing the dual‑center DR Auto‑Sync solution for faster disaster recovery.

monitoringoperationsperformance tuningdistributed databaseTiDBHTAP
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.