Achieving True Multi‑Region Active‑Active: Bidirectional Sync Across Three Data Centers
This article explains how to implement a true multi‑region active‑active architecture by enabling bidirectional data synchronization among three or more data centers, covering CAP trade‑offs, distributed ID generation algorithms, center closure strategies, final consistency mechanisms, and a disaster‑recovery design.
Background
True multi‑region active‑active requires data to be synchronously replicated in both directions among three or more data centers. A dual‑center setup in China combined with an overseas center creates a three‑center topology that challenges bidirectional sync, latency, and high availability.
CAP Considerations
For a globally distributed system, partition tolerance is mandatory. According to the CAP theorem, consistency and availability cannot both be guaranteed; the design adopts eventual consistency to preserve availability while tolerating network partitions.
Design Principles
Data Partitioning
Choose a business dimension as a sharding key so that each partition can be deployed in a separate data center. Primary keys must be generated as distributed IDs to avoid collisions during cross‑center synchronization.
Distributed ID Generation Algorithms
Snowflake Algorithm
+--------------------------------------------------------------------------+
| 1 Bit Unused | 41 Bit Timestamp | 10 Bit NodeId | 12 Bit Sequence Id |
+--------------------------------------------------------------------------+Sign bit is always 0.
41‑bit millisecond timestamp provides a 69‑year range and enables time‑based sorting.
10‑bit node identifier supports up to 1024 nodes.
12‑bit sequence allows 4096 IDs per node per millisecond.
Advantages: Stateless, no network calls, high throughput.
Disadvantages: Requires accurate clocks (clock rollback can cause duplicates), limited to int64 IDs, 4096 IDs/ms per node.
RainDrop Algorithm
+--------------------------------------------------------------------------+
| 11 Bit Unused | 32 Bit Timestamp | 7 Bit NodeId | 14 Bit Sequence Id |
+--------------------------------------------------------------------------+11 unused bits avoid JavaScript 53‑bit precision loss.
32‑bit second‑level timestamp gives a 136‑year range.
7‑bit node ID supports up to 128 nodes.
14‑bit sequence allows 16,384 IDs per node per second.
Advantages: Same stateless properties as Snowflake.
Disadvantages: Still depends on clock synchronization, lower per‑node capacity, limited to int64 IDs.
Partition‑Independent Allocation
IDs are pre‑allocated per region and managed locally via Redis. Example: an int32 space (1‑2,100,000,000) is split into 20 regions, each receiving 100 million IDs. This yields stateless uniqueness across regions.
Pros: No cross‑region state, reliable uniqueness.
Cons: Fixed capacity per region, IDs do not encode creation order.
Centralized Allocation
A central service (Redis, ZooKeeper, or a database auto‑increment column) allocates IDs globally.
Pros: Globally monotonic, no capacity limits, strong uniqueness.
Cons: Increases system complexity and creates a single point of dependency.
Center Closure
All read/write operations should stay within the local data center to minimise latency and avoid concurrent writes to multiple centers. Routing can be implemented via ADNS, Tengine, or side‑car proxies.
Final Consistency and Custom Sync Component (hera‑dts)
Because Alibaba DTS does not provide true multi‑center bidirectional sync, a custom component named hera‑dts was built. Its responsibilities include:
Ordered consumption of DRC (Data Replication Channel) messages with per‑primary‑key ordering and cross‑primary concurrency.
Failover using the Raft consensus algorithm for leader election.
Cross‑unit transport via Alibaba Cloud MNS, employing message “coloring” to preserve order where possible.
Final consistency using Last‑Write‑Wins (LWW) and idempotent processing based on a unique key composed of instance, unit, database, and MD5 of the SQL statement.
Message handling rules:
Insert: Executed as INSERT IGNORE. If no rows are affected, the statement is retried as an UPDATE.
Update: Executed directly; if zero rows are affected, it is treated as a missing prior INSERT and retried as an INSERT.
Delete: Executed as‑is because it is always safe.
These transformations rely on receiving full‑field change data from DRC. REPLACE INTO is avoided because it obscures the distinction between insert and update, breaking idempotency and potentially causing message loops.
Disaster‑Recovery Architecture
The architecture uses two‑level scheduling to enforce center closure and the self‑built hera‑dts component for bidirectional replication. Fast‑recovery strategies include quickly dropping a failing center; during the drop, write operations are temporarily disabled to prevent split‑brain writes. Sync latency is measured in milliseconds, so the impact on user experience is minimal.
Conclusion
The choice of ID allocation algorithm depends on ID type (int64 vs int32), business capacity, concurrency requirements, and whether JavaScript interaction is needed. The presented multi‑center disaster‑recovery design combines data partitioning, center closure, and eventual consistency to achieve true active‑active operation across three geographically distributed data centers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
