Mastering Multi-Active Data Centers: Solving MySQL Data Sync Loops
This article explains the concepts of unitization and multi‑active data centers, analyzes the drawbacks of single‑IDC deployments, and provides a detailed guide on implementing MySQL binlog synchronization while addressing common challenges such as data loops, duplicate inserts, unique‑index conflicts, and DDL handling, including practical solutions like GTID and auxiliary tables.
1. What Is Unitization
Unitization refers to dividing data into multiple logical units (often mapped to separate IDC locations) so that each unit stores a subset of the overall data. In this article a unit is assumed to correspond to a single IDC.
When only one IDC exists, all users write to the same underlying storage, as shown in the diagram below.
2. Problems With a Single IDC
Different regions experience varying latency; users far from the IDC suffer higher RTT.
Disaster‑recovery is limited to the IDC. Power outages, natural disasters, or fiber cuts can make the service unavailable and may cause data loss.
Real‑world incidents include a 2015 cable cut at an Alipay data center and a 2018 intentional cut at Ant Financial.
Deploying services across multiple IDC and synchronizing data resolves these issues, as illustrated below.
3. Solutions With Multi‑Active Deployment
By placing services in several IDC and synchronizing their data, we achieve:
User experience: Users can connect to the nearest IDC, reducing latency.
Disaster recovery: If one IDC fails, traffic is redirected to another IDC; because data is replicated in real time, most operations continue (a small window of data loss may still occur).
Disaster‑recovery levels include rack‑level, IDC‑level, and city‑level. City‑level DR typically requires at least three IDC (or three cities with a “two‑city‑three‑center” pattern) to maintain a majority for consensus protocols such as Zookeeper, etcd, or Consul.
4. Implementing Data Synchronization
Data synchronization is illustrated using MySQL binlog replication, but the principles apply to caches and other storage components.
4.1 Basic Knowledge
MySQL master‑slave replication works as follows: the master writes changes to a binary log (binlog); each slave runs an I/O thread to fetch the binlog and a SQL thread to apply the events from a relay log.
A custom sync component can be built with two logical modules:
Binlog syncer: pulls binlog from the source and parses it.
SQL writer: converts the parsed events into SQL statements and executes them on the target.
Binlog streams are also useful for downstream tasks such as updating Elasticsearch indexes, refreshing Redis caches, or publishing to Kafka for custom processing.
4.2 Full‑plus‑Incremental Sync
Because binlog files are often purged (e.g., via expire_logs_days or reset master), a typical strategy is to take a full data snapshot from the source, then stream incremental changes via binlog.
4.3 Handling Duplicate Inserts
When a table lacks a primary key or unique index, retries after network time‑outs can cause duplicate rows. The recommended practice is to enforce a primary key or unique constraint.
4.4 Resolving Unique‑Index Conflicts
Use a globally unique ID generator (e.g., Snowflake) to guarantee that inserts from different IDC never clash on a unique index.
4.5 DDL Synchronization
DDL statements are usually filtered out of the data‑sync pipeline because they lock tables and can severely impact performance. Online DDL tools such as gh‑ost are preferred for schema changes.
4.6 Data Loop Problems
In bidirectional sync, the same change can be propagated back to its origin, creating an infinite loop. The article details loop scenarios for INSERT, UPDATE, and DELETE operations and shows how MySQL may or may not generate binlog events depending on whether the data actually changes.
4.7 Loop‑Avoidance Techniques
Disable binlog on the target: Setting sql_log_bin=0 prevents the target from generating binlog, but also breaks its own replication, so this approach is not viable in production.
Mark the origin of each change: Add a comment such as /*IDC1:DB1*/ before the SQL statement. The syncer can then discard events whose origin matches the current IDC.
Auxiliary “direction” table: Store the source IDC and cluster in a dedicated table. When syncing, the writer inserts a row into this table before the actual data, allowing the opposite side to detect and drop loops.
GTID‑based solution: MySQL 5.6+ provides global transaction IDs (GTID). By explicitly setting GTID_NEXT on the target before executing the replicated SQL, the target will recognize that the transaction has already been applied and ignore it, preventing both loops and duplicate inserts.
GTID is lightweight and native to MySQL, but an auxiliary table may still be needed when downstream systems must know the originating IDC.
5. Open‑Source Sync Components
Alibaba Canal – binlog syncer.
Meituan Puma – SQL writer.
LinkedIn Databus – end‑to‑end data pipeline.
These projects already implement the binlog syncer and SQL writer patterns, and many recent versions support GTID, making them practical choices for building multi‑active data‑center solutions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
