Databases 20 min read

How Ctrip’s DRC Middleware Enables Real‑Time Multi‑Active MySQL Replication

DRC (Data Replicate Center) is Ctrip’s database middleware that provides real‑time bidirectional MySQL replication, achieving low‑latency, multi‑active data access across data centers while ensuring consistency through GTID, writeset, conflict resolution, DDL handling, and comprehensive monitoring.

dbaplus Community
dbaplus Community
dbaplus Community
How Ctrip’s DRC Middleware Enables Real‑Time Multi‑Active MySQL Replication

Background

Ctrip operates MySQL clusters across two data centers. Data center A hosts a primary‑replica pair, while data center B hosts a replica used for disaster‑recovery (DR). In the original configuration, applications in B had to write to A, and DBA staff performed manual DR failover when A failed.

To enable true multi‑active, geographically distributed reads and writes without manual DR, Ctrip built a real‑time bidirectional (and multi‑directional) replication component.

DRC architecture overview
DRC architecture overview

DRC Overview

DRC (Data Replicate Center) is a database middleware that provides bidirectional or multi‑directional replication. It supports Ctrip’s G2 (global, high‑quality service) strategy and enables globally distributed deployments.

Architecture

DRC follows a centralized server design and works together with the DAL (Data Access Layer) middleware, which supplies local read‑write capability. The main components are:

Replicator Container : Manages Replicator instances. Each instance pretends to be a MySQL slave, pulls binlogs from a source cluster, and stores them locally.

Applier Container : Manages Applier instances. An Applier connects to a Replicator, reads the stored binlogs, parses the SQL statements, and applies them to the target MySQL.

Cluster Manager : Handles high‑availability switching, including restarts caused by primary‑replica switches and role changes of Replicator/Applier.

Console : Exposes UI operations, external APIs, and monitoring/alerting interfaces.

DRC component diagram
DRC component diagram

DB Access Requirements

To keep replication latency low and data consistency high, every participating MySQL instance must satisfy:

MySQL version 5.7.22 or newer.

Writeset parallel replication enabled on the master (available from 5.7.22).

GTID enabled.

Each table contains a millisecond‑precision timestamp column.

Each table has a primary key or a unique key.

GTID Primer

GTID (Global Transaction ID) was introduced in MySQL 5.6.5. It replaces file‑position based replication with a format source_id:transaction_id, where source_id is the server UUID and transaction_id is a sequential number assigned at commit. GTID allows precise binlog positioning after failover and is the basis for DRC’s ordering guarantees.

Binlog Replication Pipeline

A unidirectional replication chain consists of:

Replicator : Pulls binlog events from the source MySQL, writes them to local disk, and makes them available to Applier.

Applier : Requests stored binlog events, parses them into SQL, and applies them in parallel to the target MySQL.

The pipeline involves network I/O, disk reads/writes, and CPU processing.

Replication chain diagram
Replication chain diagram

Latency Optimizations

Latency is reduced at three layers:

Network Layer

Replicator uses GTID‑based replication and the open‑source XPipe component (https://github.com/ctripcorp/x-pipe) for asynchronous network communication.

System Layer

Binlog events are parsed and kept in off‑heap memory. Heartbeat events and events from irrelevant databases/tables are filtered out. Persisted events are written to the OS page cache and flushed periodically, minimizing disk I/O.

Application Layer

Applier adopts MySQL’s Writeset parallel replication algorithm with a water‑level based parallelism scheme, allowing many SQL statements to be applied concurrently.

Idle Detection & Flow Control

Both Replicator and Applier send a heartbeat every 10 s; a 30 s timeout triggers reconnection. Replicator also uses Netty’s WRITE_BUFFER_WATER_MARK to throttle sending when the Applier cannot keep up.

Latency performance chart
Latency performance chart

Data Consistency Guarantees

DRC ensures three properties:

Ordering : Binlog files are stored using MySQL’s native format. Replicator processes events sequentially, preserving the original order even for custom DDL events.

At‑Least‑Once Delivery : Guarantees no loss and idempotent execution.

Conflict Resolution : Provides eventual consistency when concurrent updates occur.

Ordering Details

During pull, Replicator reads binlog files in native order and forwards events to Applier in the same sequence. Custom snapshot and DDL events are also ordered.

Ordering diagram
Ordering diagram

At‑Least‑Once Mechanisms

Restart Recovery : On restart, Replicator locates the last binlog file, parses the previous_gtids_event, merges GTID sets, and truncates incomplete transactions. Applier receives the current GTID set from the target DB (via Cluster Manager) and requests only missing events.

Loop‑Replication Avoidance : DRC tags transactions that originate from DRC with a special marker. The opposite Replicator filters out marked transactions. Additionally, GTID‑based source‑UUID filtering prevents cycles.

Idempotence : MySQL records executed GTIDs. If Applier receives a transaction whose GTID is already applied, MySQL silently skips it, ensuring safe duplicate delivery.

Conflict Resolution

Conflicts are minimized by routing a user’s traffic to the same data center (local‑to‑local routing in DAL) and by allocating distinct auto‑increment ID ranges or using a global ID generator. When a conflict does occur, DRC compares the millisecond‑precision timestamp columns and keeps the later update. Conflicting statements are logged and can be presented for manual review.

DDL Support

DDL changes require the Applier to know the exact table schema at the moment each binlog event was generated. DRC stores table‑structure snapshots and DDL events inside custom binlog events, eliminating the need for an external metadata store.

When a DDL event arrives, an embedded lightweight database reconstructs the required schema version for subsequent events.

For online schema changes, Ctrip uses gh‑ost, which creates a shadow table ( _xxx_gho), syncs data, and swaps tables during low‑traffic windows. DRC tracks these shadow‑table DDL events and updates its schema cache accordingly. Direct DDL on the source is also captured via binlog events and handled in the same way.

DDL handling diagram
DDL handling diagram

Monitoring & Alerts

DRC exposes core metrics and alerts:

Replication latency (typically < 1 s in production).

Data‑consistency checks (ordering, at‑least‑once, conflict detection).

Traffic and TPS monitoring.

Business‑unit, application, and IDC‑level alerts.

DDL change monitoring.

Table‑structure consistency alerts.

GTID set GAP monitoring.

Conclusion

DRC achieves low replication latency and strong data consistency through network‑level asynchronous I/O, system‑level zero‑copy and page‑cache usage, and application‑level parallel Writeset replication. GTID provides ordering, loop‑replication avoidance, and idempotence. Conflict handling relies on routing, ID range isolation, and timestamp‑based resolution. DDL support is realized via embedded schema snapshots and gh‑ost shadow tables, allowing online schema changes without breaking replication. Future work focuses on high availability, overseas deployment support, and further infrastructure enhancements to back Ctrip’s global strategy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

middlewaremysqlGTIDDRC
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.