Key Design Principles and Practical Steps for Building Multi‑Active Distributed Systems
This guide outlines the motivations, architectural guidelines, routing strategies, RPC and message replication techniques, storage synchronization methods, and traffic‑switching procedures needed to successfully implement a multi‑active, cross‑region system.
1. Reasons for Multi‑Active Deployment
High‑availability architecture deployment
Overall business disaster recovery
Capacity limits of a single data‑center
2. Guiding Principles
Core link should be self‑contained and logically sharded
Calls should converge within the same unit as much as possible
Traffic sharding logic should be balanced
Middleware needs multi‑active architecture upgrades
Business refactoring must support multi‑active solutions
Validate middleware capabilities in business scenarios
3. Driving Items
Align thinking with company‑level strategic projects and treat multi‑active as top priority
Appoint a chief architect responsible for the overall solution and outcomes
Department heads must fully drive the initiative
Each business line designates an interface owner accountable for all coordination and results
Project architect holds weekly sync meetings with business owners
Issues are first aligned internally before external communication
4. Core Link Prioritization
Guarantee multi‑active for core links first, e.g., central coupon inventory deduction
Defer multi‑active for non‑real‑time workloads such as management operations
Allow minute‑level unavailability during traffic switchover, then restore
5. Multi‑Active Routing Rules and Traffic Selection
5.1 Routing Factor Selection and Mapping
Choose routing factors based on business scenarios; common factors are geographic region and user ID.
5.2 Request Allocation to Correct Data Center
After applying multi‑active rules, requests can be routed via:
Domain switching at the terminal service level
Forwarding at the reverse‑proxy layer
Forwarding at the gateway layer
6. RPC Cross‑Data‑Center Call Capability
6.1 Registration Center Architecture
Node registration must include data‑center information
Registration center provides bidirectional synchronization across data centers
6.2 RPC Framework Cross‑Data‑Center Call
Default strategy calls within the same data center
Custom routing feature allows business to decide cross‑data‑center calls
Beware of traffic skew when new/old versions are released
7. Message Cross‑Data‑Center Replication
7.1 Replication Plugin Management and Monitoring
Use replication plugins to copy messages across data centers
Management platform monitors and controls the replicators
7.2 Traffic Isolation and Dynamic Subscription
Separate traffic by different topics to avoid duplicate replication
Dynamic SDK subscription awakens to consume replicated traffic
Mark source data‑center on replicated traffic
8. Storage Bidirectional Synchronization
8.1 Redis Bidirectional Sync
Redis bidirectional sync is optional; it is useful for short‑lived keys or when long‑term storage requires replication. One open‑source implementation is RedisSyncer (Java).
GitHub: https://github.com/TraceNature/redissyncer-server Breakpoint resume
Data synchronization
Data migration
Data validation
Implementation principle: The replicator masquerades as a slave node; during sync it writes auxiliary keys to identify traffic source and avoid duplicate replication.
Key considerations:
Plan Redis bidirectional replication early
Filter out keys with very short lifetimes (e.g., < 3 seconds)
Batch writes to improve performance
8.2 MySQL Bidirectional Sync
Bidirectional sync for relational databases is usually required in multi‑active setups. Alibaba’s open‑source Otter can be customized for this purpose.
GitHub: https://github.com/alibaba/otter Use transaction tables to break circular replication loops
Write to a transaction table within the same transaction when replicating
During sync, only copy data not present in the transaction table
9. Additional Transformation Items
Release system must support deployments to different data centers
CMDB should record resource and application identifiers per data center
Monitoring system must distinguish traffic from each data center
Other storage systems (ES, HBase, etc.) should avoid cross‑data‑center duplication when possible
10. Traffic Switching Process
10.1 Overall Flow
Multi‑active rule center issues write‑disable notice and baseline
Database SDK receives write‑disable command
Bidirectional replicator stops copying after the baseline
Replicator reports completion
Rule center sends traffic‑switch notice
Nginx/Gateway switches traffic to target data center and reports completion
Rule center cancels write‑disable
10.2 Switching Issues
Partial traffic switch scenarios (e.g., 10 % of a region or user segment)
Database write‑disable logic for partial switches
Determining when the replicator has finished and fallback strategies
10.3 Replicator Monitoring Considerations
Stability and performance monitoring of the replicator itself
Monitoring replicator progress and completion status
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
