Operations 11 min read

Key Design Principles and Practical Steps for Building Multi‑Active Distributed Systems

This guide outlines the motivations, architectural guidelines, routing strategies, RPC and message replication techniques, storage synchronization methods, and traffic‑switching procedures needed to successfully implement a multi‑active, cross‑region system.

dbaplus Community
dbaplus Community
dbaplus Community
Key Design Principles and Practical Steps for Building Multi‑Active Distributed Systems

1. Reasons for Multi‑Active Deployment

High‑availability architecture deployment

Overall business disaster recovery

Capacity limits of a single data‑center

2. Guiding Principles

Core link should be self‑contained and logically sharded

Calls should converge within the same unit as much as possible

Traffic sharding logic should be balanced

Middleware needs multi‑active architecture upgrades

Business refactoring must support multi‑active solutions

Validate middleware capabilities in business scenarios

3. Driving Items

Align thinking with company‑level strategic projects and treat multi‑active as top priority

Appoint a chief architect responsible for the overall solution and outcomes

Department heads must fully drive the initiative

Each business line designates an interface owner accountable for all coordination and results

Project architect holds weekly sync meetings with business owners

Issues are first aligned internally before external communication

4. Core Link Prioritization

Guarantee multi‑active for core links first, e.g., central coupon inventory deduction

Defer multi‑active for non‑real‑time workloads such as management operations

Allow minute‑level unavailability during traffic switchover, then restore

5. Multi‑Active Routing Rules and Traffic Selection

5.1 Routing Factor Selection and Mapping

Choose routing factors based on business scenarios; common factors are geographic region and user ID.

Routing factor mapping diagram
Routing factor mapping diagram

5.2 Request Allocation to Correct Data Center

After applying multi‑active rules, requests can be routed via:

Domain switching at the terminal service level

Forwarding at the reverse‑proxy layer

Forwarding at the gateway layer

Request routing diagram
Request routing diagram

6. RPC Cross‑Data‑Center Call Capability

6.1 Registration Center Architecture

Node registration must include data‑center information

Registration center provides bidirectional synchronization across data centers

Registration center diagram
Registration center diagram

6.2 RPC Framework Cross‑Data‑Center Call

Default strategy calls within the same data center

Custom routing feature allows business to decide cross‑data‑center calls

Beware of traffic skew when new/old versions are released

RPC cross‑data‑center diagram
RPC cross‑data‑center diagram

7. Message Cross‑Data‑Center Replication

7.1 Replication Plugin Management and Monitoring

Use replication plugins to copy messages across data centers

Management platform monitors and controls the replicators

Message replication diagram
Message replication diagram

7.2 Traffic Isolation and Dynamic Subscription

Separate traffic by different topics to avoid duplicate replication

Dynamic SDK subscription awakens to consume replicated traffic

Mark source data‑center on replicated traffic

Traffic isolation diagram
Traffic isolation diagram

8. Storage Bidirectional Synchronization

8.1 Redis Bidirectional Sync

Redis bidirectional sync is optional; it is useful for short‑lived keys or when long‑term storage requires replication. One open‑source implementation is RedisSyncer (Java).

GitHub: https://github.com/TraceNature/redissyncer-server Breakpoint resume

Data synchronization

Data migration

Data validation

Redis sync diagram
Redis sync diagram

Implementation principle: The replicator masquerades as a slave node; during sync it writes auxiliary keys to identify traffic source and avoid duplicate replication.

Key considerations:

Plan Redis bidirectional replication early

Filter out keys with very short lifetimes (e.g., < 3 seconds)

Batch writes to improve performance

8.2 MySQL Bidirectional Sync

Bidirectional sync for relational databases is usually required in multi‑active setups. Alibaba’s open‑source Otter can be customized for this purpose.

GitHub: https://github.com/alibaba/otter Use transaction tables to break circular replication loops

Write to a transaction table within the same transaction when replicating

During sync, only copy data not present in the transaction table

Otter sync diagram
Otter sync diagram

9. Additional Transformation Items

Release system must support deployments to different data centers

CMDB should record resource and application identifiers per data center

Monitoring system must distinguish traffic from each data center

Other storage systems (ES, HBase, etc.) should avoid cross‑data‑center duplication when possible

10. Traffic Switching Process

10.1 Overall Flow

Multi‑active rule center issues write‑disable notice and baseline

Database SDK receives write‑disable command

Bidirectional replicator stops copying after the baseline

Replicator reports completion

Rule center sends traffic‑switch notice

Nginx/Gateway switches traffic to target data center and reports completion

Rule center cancels write‑disable

Traffic switching flow diagram
Traffic switching flow diagram

10.2 Switching Issues

Partial traffic switch scenarios (e.g., 10 % of a region or user segment)

Database write‑disable logic for partial switches

Determining when the replicator has finished and fallback strategies

10.3 Replicator Monitoring Considerations

Stability and performance monitoring of the replicator itself

Monitoring replicator progress and completion status

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

traffic routingdata replicationmulti-activecloud architecture
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.