Databases 21 min read

How Ctrip’s DRC Enables High‑Performance Cross‑Region MySQL Replication

This article explains the design and implementation of Ctrip's Data Replication Center (DRC), a MySQL‑based high‑availability system that solves cross‑region data loop, progress tracking, concurrency, DDL handling, and conflict resolution to achieve low‑latency, reliable data replication for global travel services.

Ctrip Technology
Ctrip Technology
Ctrip Technology
How Ctrip’s DRC Enables High‑Performance Cross‑Region MySQL Replication

Introduction

Cross‑region data replication is essential for global online services that need to switch traffic between regions while staying within a compliance zone. The Data Replication Center (DRC) provides high‑performance, zero‑copy, parallel replication between MySQL clusters in different regions.

DRC Architecture

DRC is built on MySQL and consists of a replication module that reads MySQL binary logs (Binlog) and forwards the events to synchronization modules in other regions. Each synchronization module parses the Binlog events, rewrites them into SQL, and applies the statements to the target database.

Core challenges and solutions

Data loop and replication progress management

Concurrent replication

DDL handling

Write‑conflict detection and resolution

Data loop and replication progress

In a multi‑master setup a transaction written in one region is replicated to the others. Without special handling the same transaction could be sent back, forming a loop. DRC marks its own replication operations so that reverse synchronization can filter them out.

MySQL GTID consists of SERVER_UUID and a transaction ID. The synchronization component records the source GTID in the target’s gtid_executed variable via: SET GTID_NEXT='uuid:tid'; If the target does not have root privileges, DRC stores GTID information in a dedicated gtid_executed table using ordinary DML instead of SET GTID_NEXT. The reverse component discards any event whose GTID does not contain its own SERVER_UUID, thereby breaking the loop.

Concurrent replication

DRC leverages MySQL’s WRITESET mechanism, which records transaction dependencies in the Binlog. Transactions that do not depend on each other can be executed in parallel, increasing replication concurrency.

Example: Transaction A updates customer 1 and creates order 1, Transaction B updates customer 2 and creates order 2, Transaction C updates the name of customer 1. WRITESET identifies that A and B can run concurrently, while C must wait for A because they modify the same row.

DDL handling

DDL statements generate Binlog events that could block replication if executed directly by the sync module. DRC avoids this by requiring DBAs to apply DDL on both source and target clusters manually. When a new column is added, the sync module rewrites incoming SQL to drop the column if the value matches the column’s default, ensuring eventual consistency.

Because MySQL Binlog (especially versions < 8) does not contain column names, DRC maintains a local metadata database. When a DDL event is detected, the replication module applies the DDL to the local database, obtains the updated schema, and uses that information to correctly rewrite or filter the incoming SQL before applying it to the target.

Write‑conflict detection and resolution

Each table must contain an updatetime column. When two writes to the same row arrive from different regions, DRC compares the timestamps: the latest timestamp wins. For an update vs. delete conflict, the update wins because the delete removes the row. All conflicts are logged and exposed through a monitoring UI.

Binlog message subscription

To keep caches consistent across regions, DRC subscribes to MySQL Binlog changes in one region and pushes cache‑invalidations to other regions. This avoids the latency of service‑to‑service calls or Redis replication.

Stability and performance improvements

Replication latency is the primary metric. Influencing factors include network quality, data volume, target DB write throughput, and DRC component efficiency.

Network latency, bandwidth, packet loss

Amount of data and concurrency of writes

Write performance of the target database

Performance of DRC replication and sync modules

DRC uses Netty for asynchronous, high‑throughput networking, stores Binlog events on local disk, and streams them to the sync component asynchronously. Zero‑copy and off‑heap memory are used for event handling.

Throughput (TPS) is related to concurrency ( n) and average transaction time ( t) as:

TPS = n / t

Throughput enhancements

DRC adds a custom filter_log_event that can skip entire transactions based on schema, and splits sync links to increase parallelism. Tests show end‑to‑end latency reduced from 8 s to 1 s after switching from instance‑level to database‑level replication.

Flow control

DRC employs Netty’s WRITE_BUFFER_WATER_MARK to throttle sending rates based on buffer occupancy, preventing overload when the sync module recovers from a failure.

High‑availability and recovery

Sub‑second primary‑standby failover is provided across data centers. GTID‑based recovery automatically resumes replication after regional outages, reducing mean‑time‑to‑recover.

Replication topology

Two topologies are supported:

Full‑mesh : each region replicates to every other region; recommended for three or fewer regions.

Star : a central hub region replicates to all others; recommended when the number of regions exceeds three.

Conclusion

DRC solves key cross‑region replication problems—data loops, progress tracking, parallel execution, DDL mismatches, and write conflicts—through GTID‑based marking, WRITESET‑driven concurrency, metadata‑driven DDL handling, and timestamp‑based conflict resolution. Combined with asynchronous Netty communication, zero‑copy processing, flow control, and flexible topology choices, DRC achieves low latency, high throughput, and robust availability for globalized services.

distributed-systemsdatabasehigh availabilityMySQLdata replicationGTIDcross-region
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.