Using TiDB Data Migration (DM) for MySQL‑to‑TiDB Sync: Architecture, Features, Tuning and Troubleshooting
This article shares practical experience with TiDB Data Migration (DM), covering its background, architecture, key features, online DDL support, common error handling such as duplicate‑key issues, large‑scale import tuning, configuration limits, and cleanup recommendations for reliable MySQL‑to‑TiDB synchronization.
In the early days of synchronizing MySQL to TiDB, we relied on mydumper+loader for full backups and syncer for incremental binlog replication, which required many configuration files and complex setup.
PingCAP later released the TiDB Data Migration (DM) suite, a unified platform that simplifies full‑load and incremental data migration from MySQL or MariaDB to TiDB, reduces operational overhead, and provides a graphical dm‑portal for task creation (now discontinued).
Having used DM since its internal testing version up to the latest 1.0.6, I found it essential for DBAs because most TiDB deployments involve migrating existing MySQL schemas, performing performance comparisons, and then loading data.
Architecture
DM consists of three core components: DM‑master (task management), DM‑worker (execution), and dmctl (command‑line control). The following diagram illustrates the architecture:
Key Features
Table routing and merge migration
Whitelist/blacklist for tables
Binlog event filtering
Shard support for merging tables
New Feature in 1.0.5 – Online DDL Support
DM now supports online schema changes via tools like pt‑online‑schema‑change and gh‑ost . Previously, DDL on temporary tables was skipped, causing downstream TiDB to miss new columns and raise errors.
Example of a failed DDL without online‑DDL support:
skip event because not in whitelist
RENAME TABLE `h_2`.`helei5` TO `h_2`.`_helei5_old`After enabling online DDL (parameter online‑ddl‑scheme: "pt" ), the new column is correctly replicated downstream.
Sample command to skip a problematic binlog position:
sql‑skip --worker=192.168.1.248:8262 --binlog‑pos=4369‑binlog|000001.000021:62765733 task_4369Task status query before and after skipping:
{
"taskName": "task_4369",
"taskStatus": "Running",
"workers": ["192.168.1.248:8262"]
}When a duplicate‑key error (Error 1062) occurs, the log shows:
{
"msg": "[code=10006:class=database:scope=not-set:level=high] execute statement failed: commit: Error 1062: Duplicate entry ... for key 'clientid'",
"taskStatus": "Error - Some error occurred in subtask. Please run `query‑status task_4369` to get more details."
}Resolution involved adding the missing column on the downstream and using replace into to avoid duplicate‑key conflicts.
Large‑Batch Import Tuning
During massive imports, cluster latency spikes. Adjusting the following parameters helped mitigate the issue (values are examples; tune per cluster):
raftstore:
apply-pool-size: 3-4
store-pool-size: 3-4
storage:
scheduler-worker-pool-size: 4-6
server:
grpc-concurrency: 4-6
rocksdb:
max-background-jobs: 8-10
max-sub-compactions: 1-2Additionally, configure DM‑worker cleanup:
[purge]
interval = 3600
expires = 7
remain-space = 15Note that relay‑log expiration defaults to never delete; set expires to retain logs for a specific number of days.
Limitations
Supported MySQL versions: 5.5 < 8.0; MariaDB ≥ 10.1.2
Only DDL syntax supported by TiDB parser
Upstream binlog must be enabled with binlog_format=ROW
DM does not support dropping multiple partitions in one statement or dropping indexed columns directly
DM‑portal Caveats
The portal auto‑generates task files but lacks full‑database regex matching, causing temporary tables from online DDL tools to be ignored; this was fixed in version 1.0.5.
Conclusion
From first exposure to TiDB in 2019 to becoming a core member, presenting at DEVCON 2020, and receiving the TUG most influential content award, the author emphasizes continuous sharing of technical knowledge. The article underscores the importance of proper DM configuration, online DDL support, and cleanup to maintain stable, high‑performance TiDB clusters.
360 Smart Cloud
Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.