Unlocking TiCDC: Efficient Incremental Data Sync for TiDB in Real‑World Scenarios
This article explains how TiCDC, a change‑data‑capture tool for TiDB, addresses incremental extraction, cross‑region hot‑standby, and stream processing needs, outlines its architecture, discusses early‑version issues, and provides best‑practice recommendations for stable, high‑performance data synchronization.
TiCDC is an incremental data synchronization tool for TiDB that pulls TiKV change logs, can restore data to any upstream TSO state, and offers an open data protocol for other systems to subscribe to data changes.
TiCDC Business Use Cases
1) Incremental Data Extraction
Data warehouses need nightly ETL to extract incremental or full data based on timestamps. Extracting large dimension tables (200 billion rows, >5 TB each) from a 70‑node TiDB cluster overloads CPU, memory, and network, causing failures and delayed reporting. Using TiCDC to write changes to Kafka provides a needed incremental solution.
2) Cross‑Region Dual‑Cluster Hot‑Standby
When the primary data center fails, a standby TiDB cluster can take over. TiCDC can sync data to the standby, offload read traffic, and enable rapid failover.
3) Stream Processing
Previously, MySQL binlog tools (Maxwell, Canal) fed Kafka for real‑time jobs and Flink table joins. After migrating MySQL to TiDB, TiCDC must reliably support this workflow, handling DDL compatibility and sync interruptions.
TiCDC Architecture
The architecture consists of four main modules: TiKV cluster, PD, TiCDC cluster, and downstream sink components.
1. TiCDC Cluster
The cluster runs multiple capture processes, each handling change‑log pulling, sorting, and forwarding.
Puller: pulls TiKV change logs and sorts them per table.
Processor: each processor thread syncs one or more tables; a capture node can run many processors.
High availability: captures have owner/non‑owner roles; the owner schedules tasks and registers with PD. If the owner fails, a new one is elected, and tasks are reassigned. Check capture status with tiup ctl cdc capture list --pd=http://xxxxx:2379
Changefeed: a sync task can target a TiDB instance or filter specific databases/tables. Create a changefeed: tiup ctl:v4.0.14 cdc changefeed create --pd=http://pd-ip:2379 --sink-uri="mysql://User:password@vip:4000/" --changefeed-id="sync-name" --start-ts=0 --config=./ticdc.yaml List changefeeds: tiup ctl:v4.0.14 cdc changefeed list --pd=http://pd-ip:2379
2. TiKV Cluster
Writes are first persisted to a WAL (Write‑Ahead Log), which serves as the KV change log. TiCDC pulls row‑change events from this log.
3. PD Cluster
PD manages global TSO allocation, metadata, and scheduling. For TiCDC it stores changefeed configurations, capture metadata, owner election info, and processor sync status.
4. Sink Components
Downstream destinations include:
MySQL‑compatible databases (MySQL, TiDB)
Kafka/Pulsar for consumption by Flink or other stream processors
Incremental backup storage such as S3
Issues in Early Versions
OOM under high write spikes
OOM after restart when large writes occurred during interruption
Write‑write conflicts when the downstream is TiDB
Sort‑dir defaulted to system disk, causing disk‑full failures
Unbounded concurrency in log pulling overloaded TiKV nodes
Large DDL operations blocked change log sync (fixed in 4.0.14+)
Best Practices
Use the latest TiCDC version (4.0.14/5.1.1) to benefit from bug fixes.
Monitor TiCDC via its dashboard; see changelog pull speed and sink status. Docs: https://docs.pingcap.com/zh/tidb/stable/monitor-ticdc
When writing to Kafka, choose compatible protocols (canal, canal‑json, avro, maxwell) for seamless migration.
Split high‑write tables into multiple changefeeds; query STATEMENTS_SUMMARY_HISTORY to identify hot tables. select digest,DIGEST_TEXT,TABLE_NAMES,sum(EXEC_COUNT),min(SUMMARY_BEGIN_TIME),max(SUMMARY_END_TIME) from STATEMENTS_SUMMARY_HISTORY where stmt_type='insert' and schema_name='ad_monitor' group by digest order by sum(exec_count) desc limit 20;
Set Kafka partition‑num to 1 to preserve order across consumers.
Configure per-table-memory-quota to 6 MiB to avoid goroutine OOM: <code>tiup edit-config tidb-cluster-name cdc: per-table-memory-quota: 6291456</code>
Limit TiKV pull speed (e.g., 20 MiB) to reduce impact on the cluster: <code>tiup edit-config tidb-cluster-name tikv: cdc.incremental-scan-speed-limit: 20MB</code>
Adjust gc-ttl (default 24 h) for long‑running syncs; remove stale changefeeds with: <code>tiup ctl:v4.0.14 cdc changefeed remove --pd=http://pd-ip:2379 --changefeed-id=TICDC-XXX</code>
When using TiCDC for cross‑region sync, increase GC lifetime to avoid log loss.
Large transactions (>5 GB) may cause OOM; be aware of TiDB’s 10 GB limit.
On sync interruption, query error details: <code>tiup ctl:v4.0.14 cdc changefeed query -s --pd=http://pd-ip:2379 --changefeed-id=TICDC-XXX</code>
Conclusion
TiCDC is now reliable for most scenarios, having resolved early OOM and load‑impact issues through collaboration between 360 and PingCAP. Users with similar high‑write requirements are encouraged to adopt TiCDC and provide feedback to further improve its stability and efficiency.
Xiaolei Talks DB
Sharing daily database operations insights, from distributed databases to cloud migration. Author: Dai Xiaolei, with 10+ years of DB ops and development experience. Your support is appreciated.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.