Databases 9 min read

Design and Implementation of the Thor System for Containerized Management of TiDB

This article describes the challenges of scaling MySQL workloads, introduces TiDB’s distributed architecture, and details the Thor system’s container‑orchestrated design—including scheduling, cluster and database management, data synchronization with Hamal, and integrated monitoring and alerting—to achieve efficient, automated operation of large‑scale TiDB clusters.

Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Design and Implementation of the Thor System for Containerized Management of TiDB

With rapid internet growth, data volumes can surge from hundreds of gigabytes to hundreds of terabytes, making traditional single‑node MySQL unsuitable for scalability and cost‑effectiveness; sharding increases complexity and multi‑dimensional queries often require extra storage or performance trade‑offs.

TiDB, an open‑source distributed database from PingCAP, offers strong consistency, elastic scaling, and three core components—TiDBServer (SQL parsing, optimization, client interaction), TiKVServer (distributed key‑value storage with horizontal scaling and high availability), and PDServer (metadata management and scheduling).

Deploying and operating such a distributed system manually is costly and error‑prone; existing automation tools lack state management and struggle with local storage requirements of TiDB, prompting the development of a custom container orchestration solution.

The Thor system adopts a modular architecture with a control module (Allocator, Label, Discover, Manage, Customize) and an agent module, providing resource abstraction, improved utilization, operational efficiency, and fine‑grained management.

Container scheduling addresses TiDB’s need for local storage, CPU balancing via cpuset‑cpus, label‑based placement, host resource limits, automatic discovery, full lifecycle management, and fault remediation.

Cluster management automates TiDB lifecycle tasks such as initialization, rolling upgrades, elastic capacity adjustments, monitoring integration, and node maintenance, reducing deployment time from hours to minutes.

Database management adds automatic statistics updates, overload protection, slow‑query analysis, and SQL alerting, leveraging ELK, mysql‑sniffer, Flume, Kafka, Spark, and Hadoop for comprehensive performance insight.

Data synchronization is handled by the Hamal tool, a MySQL‑compatible binlog consumer that supports GTID, automatic master‑slave failover, multi‑target TiDB clusters, filtering, rewriting, and parallel transaction merging, enabling real‑time MySQL‑to‑TiDB replication.

Monitoring and alerting integrate metrics from Zabbix, TiDB APIs, and custom collectors, storing unified data in TiDB for analysis and visualizing via Grafana, while a bespoke alert engine delivers timely notifications through channels like WeChat.

In practice, deploying TiDB with Thor has consolidated hundreds of machines and dozens of clusters, cut deployment time to two minutes, and improved DBA efficiency and service availability, with future work focusing on audit and SQL analysis enhancements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Monitoringdistributed databaseTiDBContainer Orchestration
Tongcheng Travel Technology Center
Written by

Tongcheng Travel Technology Center

Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.