Databases 24 min read

Mastering Multi-Active Data Centers: Solving MySQL Data Sync Loops

This article explains the concepts of unitization and multi‑active data centers, analyzes the drawbacks of single‑IDC deployments, and provides a detailed guide on implementing MySQL binlog synchronization while addressing common challenges such as data loops, duplicate inserts, unique‑index conflicts, and DDL handling, including practical solutions like GTID and auxiliary tables.

dbaplus Community
dbaplus Community
dbaplus Community
Mastering Multi-Active Data Centers: Solving MySQL Data Sync Loops

1. What Is Unitization

Unitization refers to dividing data into multiple logical units (often mapped to separate IDC locations) so that each unit stores a subset of the overall data. In this article a unit is assumed to correspond to a single IDC.

When only one IDC exists, all users write to the same underlying storage, as shown in the diagram below.

Single IDC architecture
Single IDC architecture

2. Problems With a Single IDC

Different regions experience varying latency; users far from the IDC suffer higher RTT.

Disaster‑recovery is limited to the IDC. Power outages, natural disasters, or fiber cuts can make the service unavailable and may cause data loss.

Real‑world incidents include a 2015 cable cut at an Alipay data center and a 2018 intentional cut at Ant Financial.

Deploying services across multiple IDC and synchronizing data resolves these issues, as illustrated below.

Multi‑IDC architecture
Multi‑IDC architecture

3. Solutions With Multi‑Active Deployment

By placing services in several IDC and synchronizing their data, we achieve:

User experience: Users can connect to the nearest IDC, reducing latency.

Disaster recovery: If one IDC fails, traffic is redirected to another IDC; because data is replicated in real time, most operations continue (a small window of data loss may still occur).

Disaster‑recovery levels include rack‑level, IDC‑level, and city‑level. City‑level DR typically requires at least three IDC (or three cities with a “two‑city‑three‑center” pattern) to maintain a majority for consensus protocols such as Zookeeper, etcd, or Consul.

4. Implementing Data Synchronization

Data synchronization is illustrated using MySQL binlog replication, but the principles apply to caches and other storage components.

4.1 Basic Knowledge

MySQL master‑slave replication works as follows: the master writes changes to a binary log (binlog); each slave runs an I/O thread to fetch the binlog and a SQL thread to apply the events from a relay log.

MySQL master‑slave architecture
MySQL master‑slave architecture

A custom sync component can be built with two logical modules:

Binlog syncer: pulls binlog from the source and parses it.

SQL writer: converts the parsed events into SQL statements and executes them on the target.

Binlog streams are also useful for downstream tasks such as updating Elasticsearch indexes, refreshing Redis caches, or publishing to Kafka for custom processing.

4.2 Full‑plus‑Incremental Sync

Because binlog files are often purged (e.g., via expire_logs_days or reset master), a typical strategy is to take a full data snapshot from the source, then stream incremental changes via binlog.

4.3 Handling Duplicate Inserts

When a table lacks a primary key or unique index, retries after network time‑outs can cause duplicate rows. The recommended practice is to enforce a primary key or unique constraint.

4.4 Resolving Unique‑Index Conflicts

Use a globally unique ID generator (e.g., Snowflake) to guarantee that inserts from different IDC never clash on a unique index.

4.5 DDL Synchronization

DDL statements are usually filtered out of the data‑sync pipeline because they lock tables and can severely impact performance. Online DDL tools such as gh‑ost are preferred for schema changes.

4.6 Data Loop Problems

In bidirectional sync, the same change can be propagated back to its origin, creating an infinite loop. The article details loop scenarios for INSERT, UPDATE, and DELETE operations and shows how MySQL may or may not generate binlog events depending on whether the data actually changes.

4.7 Loop‑Avoidance Techniques

Disable binlog on the target: Setting sql_log_bin=0 prevents the target from generating binlog, but also breaks its own replication, so this approach is not viable in production.

Mark the origin of each change: Add a comment such as /*IDC1:DB1*/ before the SQL statement. The syncer can then discard events whose origin matches the current IDC.

Auxiliary “direction” table: Store the source IDC and cluster in a dedicated table. When syncing, the writer inserts a row into this table before the actual data, allowing the opposite side to detect and drop loops.

GTID‑based solution: MySQL 5.6+ provides global transaction IDs (GTID). By explicitly setting GTID_NEXT on the target before executing the replicated SQL, the target will recognize that the transaction has already been applied and ignore it, preventing both loops and duplicate inserts.

GTID is lightweight and native to MySQL, but an auxiliary table may still be needed when downstream systems must know the originating IDC.

GTID example
GTID example

5. Open‑Source Sync Components

Alibaba Canal – binlog syncer.

Meituan Puma – SQL writer.

LinkedIn Databus – end‑to‑end data pipeline.

These projects already implement the binlog syncer and SQL writer patterns, and many recent versions support GTID, making them practical choices for building multi‑active data‑center solutions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Database ArchitecturemysqlBinlogmulti-activedata synchronizationGTID
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.