Databases 16 min read

How to Enable Multi‑Data‑Center Active‑Active Redis with Bidirectional Sync and rLog

This article explains how a company extended native Redis to support bidirectional synchronization across multiple data‑center sites, addressing issues such as lack of master‑master replication, data loops, idempotency, write conflicts, and providing a custom rLog design for efficient breakpoint‑resume and performance.

dbaplus Community
dbaplus Community
dbaplus Community
How to Enable Multi‑Data‑Center Active‑Active Redis with Bidirectional Sync and rLog

Background

The business requires unit‑level deployment across several data‑center sites to achieve disaster recovery and faster user access. True active‑active operation demands that each site holds a complete, consistent dataset, which in turn requires reliable bidirectional data synchronization.

Limitations of Native Redis

Native Redis only supports master‑slave replication and cannot perform cross‑site master‑master sync. Consequently, writes must go through a single master, preventing true active‑active writes and requiring manual failover during site outages.

Key Challenges

Data loop: data written locally may be synced to another site and then back, causing duplicate synchronization.

Idempotency: repeated command execution during partial sync must not corrupt data.

Write conflicts: concurrent writes to the same key from different sites can lead to inconsistent values.

Breakpoint‑Resume Enhancements

Redis uses a circular replication buffer for partial sync, but its default size (64 MiB) is insufficient for long‑duration cross‑site outages. To improve resilience, the system adds a persistent log (rLog) that records incremental changes, allowing synchronization to resume after extended disconnections without expanding the in‑memory buffer.

Redis Node Modifications

1. RESP Protocol Extension

Each write command destined for replication now carries a unique #{id}\r\n header. Local clients continue using the original RESP format; the master adds the header before forwarding to replicas. Remote‑site writes always use the extended protocol.

2. Real‑Time Write Logging

Extended commands are appended to a log file with an accompanying index file. Commands originating from other sites are excluded from the log to keep its size manageable.

3. Synchronization Flow Redesign

Full sync creates a child process and an RDB dump, which is costly. The new flow prefers partial sync via the circular buffer; if that fails, it attempts sync using the rLog. Only when both buffers are unavailable does it fall back to full sync.

rLog Design

Index File Format

Each entry stores:

pos – offset of the command’s first byte in the log file.

len – length of the command.

offset – cumulative offset in the master’s circular buffer.

Log File Splitting

Logs are split when a file reaches 128 MiB or when an hour passes with more than 100 k entries, preventing unbounded growth.

Log File Deletion

By default, logs older than one day are deleted. During an ongoing sync, deletion is temporarily paused to avoid losing needed data.

Data Synchronization Process

The sync tool first announces its rLog capability via REPLCONF CAPA. The master then follows a handshake:

Replica sends PSYNC runId offset (or PSYNC ? -1 on first start).

If the circular buffer can serve the request, master replies +CONTINUE runId.

If not, master replies +LPSYNC.

Replica sends LPSYNC runId id to request rLog sync.

Master replies +LCONTINUE runId if rLog can continue; otherwise it falls back to +FULLRESYNC.

After rLog data transfer, master sends LCOMMIT offset so the replica updates its offset.

Idempotency Adjustments

To guarantee safe re‑execution, non‑idempotent commands were rewritten; list commands remain non‑idempotent and are excluded from cross‑site sync.

Data Loop Handling

Commands carry an id field: positive IDs for locally originated writes, negative IDs for remote or heartbeat traffic. Replicas filter out commands with negative IDs, preventing looped replication.

Expiration and Eviction

Each site independently handles key expiration and eviction. Deletion commands generated by these processes receive negative IDs and are filtered out, so they are not propagated across sites.

Data Migration

During cluster scaling, slot‑based migration moves keys between nodes. Migration‑related DEL and RESTORE commands are assigned negative IDs, ensuring they are not synchronized across data centers.

Performance Evaluation

Benchmarks show that the active‑active Redis instance with rLog enabled delivers performance comparable to native Redis with AOF persistence.

Pending Optimizations

Write conflicts across sites remain unresolved; a future CRDT‑based solution is planned.

List‑type commands lack idempotency support.

Consistency issues from expired or evicted keys persist; operational best practices (capacity planning, alerts) are recommended.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

redisReplicationActive-Activemulti‑datacenterrLogDataSync
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.