Databases 13 min read

How Tencent’s TDSQL Multi‑Source Sync Achieves High‑Performance, Consistent Data Distribution

This article explains the financial‑industry driven requirements for real‑time data sync, describes the TDSQL‑MULTISRCSYNC architecture—including producer, store, and consumer components—and details core designs such as row‑hash concurrency, idempotent binlog handling, and a lock‑based ordering mechanism that ensure high throughput and consistency.

dbaplus Community
dbaplus Community
dbaplus Community
How Tencent’s TDSQL Multi‑Source Sync Achieves High‑Performance, Consistent Data Distribution

Scenario and Requirements

In financial systems, real‑time data synchronization, subscription and distribution are required, e.g., insurance branch‑head architectures and banking core transaction systems that need to stream data to analytical subsystems. Tencent Cloud TDSQL provides data distribution and decoupling capabilities.

The TDSQL‑MULTISRCSYNC module implements a high‑performance, strongly consistent, multi‑heterogeneous data distribution service. It supports TDSQL as source or target and can sync to MySQL, Oracle, PostgreSQL, Kafka and vice‑versa, with flexible one‑to‑many and many‑to‑one topologies.

System Architecture

The service follows a log‑based CDC replication model and consists of three logical components:

Producer : parses incremental logs (binlog for MySQL/TDSQL, materialized‑view logs for Oracle), packages them as JSON messages and pushes to a Kafka topic with at‑least‑once delivery.

Store : Kafka acts as the intermediate queue; each topic uses a single partition to preserve order.

Consumer : consumes CDC messages, applies idempotent logic, and replays them to the target. Because the producer may duplicate messages, the consumer guarantees idempotent replay.

System architecture diagram
System architecture diagram

Core Design and Implementation

1. Row‑Hash Concurrency Strategy

To avoid the latency of serial binlog replay, the consumer hashes the combination of table name and primary‑key value to assign each row event to a specific replay thread. Events affecting the same row are processed by the same thread, preserving order, while different rows are processed in parallel. Each thread batches messages into transactions before replay. This design achieves up to 40 000 QPS per task and provides eventual consistency.

Row‑hash concurrency diagram
Row‑hash concurrency diagram

2. Idempotent Handling of Row‑Format Binlog Events

Because the producer uses at‑least‑once semantics, the consumer must handle duplicate messages. The idempotent rules are:

INSERT : If a primary‑key conflict occurs, the operation is transformed into DELETE + INSERT. A negative affected‑row count indicates the conflict.

UPDATE : Guarantees that only the new value exists after replay.

DELETE : Ensures that no row with the deleted key remains.

These rules eliminate the need for explicit snapshot points during full‑load migrations; incremental logs can be replayed safely.

INSERT idempotent handling
INSERT idempotent handling
UPDATE idempotent handling
UPDATE idempotent handling
DELETE idempotent handling
DELETE idempotent handling

3. Concurrency Control Under Multiple Unique Constraints

When a table has a primary key plus additional unique indexes, naive hash‑based threading can break ordering. The solution introduces a lock structure attached to each CDC event to coordinate execution order across threads that share the same unique‑key value.

The lock contains a wait‑count , an event‑id , and a condition‑map . During dispatch, the consumer checks for unique constraints, creates or retrieves the corresponding lock, and attaches it to the message.

Consumer threads follow this workflow:

Check the lock’s condition map; if the preceding event has not released the lock, wait on a condition variable.

When notified, verify the lock is released, then replay the message.

After replay, decrement the lock’s wait‑count; if it reaches zero, destroy the lock, otherwise update the condition map and broadcast to waiting threads.

Lock structure diagram
Lock structure diagram

4. Optimization and Outlook

The multi‑source sync service is deployed in Tencent’s public and private clouds for many financial customers, providing reliable data distribution. Future work includes adding support for additional heterogeneous platforms such as DB2, SQL Server, and big‑data ecosystems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Kafkalockingdata synchronizationIdempotencyDatabase ReplicationTDSQLMulti-Source Sync
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.