Redis Replication Deep Dive: Full Sync, Partial Sync, Heartbeat & Async Copy
This article explains Redis's replication mechanism in detail, covering the step-by-step copy process, data synchronization via PSYNC, full and partial synchronization workflows, heartbeat management, and asynchronous replication, while highlighting key commands, offsets, run IDs, and performance considerations.
Redis Replication Overview
Redis implements a master‑slave (primary‑replica) model where the primary continuously streams write commands to one or more replicas to keep their data in sync.
1. Replication Initiation (Copy Process)
The replica runs SLAVEOF <master_ip> <master_port>, storing the master address without starting replication immediately.
A background task on the replica detects the stored master information and opens a TCP socket to the master.
After the socket is established the replica sends PING and expects a PONG. If the reply is missing the replica retries the connection.
If the master requires authentication, the replica sends AUTH <password>. Failure aborts the replication attempt.
Upon successful authentication the master begins the data synchronization phase, which is the most time‑consuming step because the entire dataset is streamed to the replica.
After the initial sync, the master forwards every subsequent write command to the replica, preserving consistency.
2. Data Synchronization Commands
Two commands exist for synchronization: SYNC – used in Redis versions prior to 2.8; always triggers a full sync. PSYNC – introduced in Redis 2.8; supports both full and partial synchronization.
3. PSYNC Mechanics
PSYNC relies on three pieces of state:
Replication offset of the primary ( masterreploffset from INFO REPLICATION).
Replication offset reported by the replica (sent to the primary every second).
The primary’s 40‑character run ID, generated at startup.
Command syntax: PSYNC <runId> <offset> Parameters:
runId – the primary’s run ID known by the replica (use -1 for a fresh replica).
offset – the replica’s current replication offset (use -1 for a fresh replica).
Possible master replies: +FULLRESYNC <runId> <offset> – forces a full synchronization. +CONTINUE – indicates that the missing data is still present in the replication backlog, so a partial sync will be performed. +ERR – the master does not support PSYNC; the replica falls back to SYNC (full sync).
4. Full Synchronization
A full sync occurs when a replica connects for the first time or when the primary’s run ID changes (e.g., after a restart).
Replica sends PSYNC -1 -1 (or SYNC on older versions).
Primary replies with +FULLRESYNC and creates an RDB snapshot via BGSAVE.
The RDB file is streamed to the replica.
Replica loads the RDB into memory.
During the transfer the primary buffers new write commands in the replication backlog (default 1 MB).
After the replica finishes loading, it may trigger BGREWRITEAOF if AOF persistence is enabled.
Caveats :
If the RDB size exceeds the network capacity (e.g., >6 GB on a 1 Gbps link) the default repl-timeout of 60 seconds may cause the sync to abort. Increase repl-timeout accordingly.
Redis also offers “disk‑less” replication (streaming the RDB directly), but it is less mature and should be used with caution in production.
5. Partial Synchronization
When a network interruption occurs but the primary’s run ID remains unchanged, the primary can serve only the missing commands from its backlog, avoiding a full sync.
If the replica’s connection times out, the primary closes it.
The primary continues to write pending commands to the replication backlog (default 1 MB).
When the replica reconnects it sends its last known offset and the primary’s run ID.
If the required data is still in the backlog, the primary replies with +CONTINUE and streams the buffered commands.
The replica applies the received commands, completing the partial sync.
6. Heartbeat and Connection Liveness
After replication is established, the primary and replica maintain a persistent TCP connection with periodic heartbeats.
The primary sends a PING to each replica every repl-ping-slave-period seconds (default 10 s).
The replica acknowledges with REPLCONF ACK <offset> roughly once per second, reporting its current replication offset.
If the primary does not receive an ACK within repl-timeout (default 60 s), it marks the replica as offline.
Connection flags visible via CLIENT LIST: M for the primary side, S for the replica side.
7. Asynchronous Replication
Write commands are processed by the primary and the response is returned to the client immediately. Replication to replicas occurs asynchronously:
Primary receives and executes a write command.
Primary sends the result back to the client.
In the background the primary forwards the command to all replicas; each replica applies it in its main thread.
Key Configuration Parameters
repl-timeout– maximum idle time before a replica is considered offline (default 60 s). repl-ping-slave-period – interval between primary‑initiated PING messages (default 10 s). repl-backlog-size – size of the replication backlog buffer (default 1 MB). repl-backlog-ttl – time-to-live for the backlog (default 3600 s).
This summary captures the essential mechanics of Redis replication, including the initial copy process, PSYNC‑based synchronization, full and partial sync workflows, heartbeat handling, and the asynchronous nature of write propagation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
