Databases 53 min read

How to Build Redis Master‑Slave Replication with Sentinel and Enable Automatic Failover

Learn step‑by‑step how to configure Redis master‑slave replication, set up Sentinel for health monitoring, automate failover, secure the deployment, tune performance, troubleshoot common issues, and integrate monitoring with Prometheus and Grafana, ensuring high‑availability for production workloads.

MaGe Linux Operations

Jun 13, 2026

How to Build Redis Master‑Slave Replication with Sentinel and Enable Automatic Failover

Overview

This guide explains how to turn a single‑node Redis instance into a production‑grade high‑availability service using master‑slave replication and Sentinel for automatic failover. It covers the underlying replication protocol, required configuration files, client integration, monitoring, troubleshooting, performance tuning, security hardening, and operational runbooks.

Redis Replication Mechanics

Redis replication is a hybrid push‑pull model. A slave connects to the master and sends PSYNC. The master decides whether to perform a full sync (RDB snapshot + command replay) or a partial sync (send only the missing entries from the repl_backlog buffer).

Key fields: replid – the replication ID. It changes on master restart, forcing a full sync. offset – the replication offset. The difference master_repl_offset - slave_repl_offset indicates how far the slave lags.

Typical INFO replication output shows role, master_link_status, master_last_io_seconds_ago, and the offsets.

Sentinel Architecture

Sentinel runs as an independent process (usually three or five instances). Its responsibilities are:

Continuously PING master and slaves to detect failures (monitoring).

Notify administrators via scripts (notification).

Perform automatic failover when a quorum of sentinels agrees the master is down (ODOWN).

Provide the current master address to clients (configuration provider).

Sentinel distinguishes two failure states:

Subjective down (SDOWN) – a single sentinel marks the master as down after down-after-milliseconds timeout.

Objective down (ODOWN) – a majority (quorum) of sentinels agree, triggering failover.

Deployment Steps

Install Redis (e.g., yum install -y redis or compile from source).

Create a shared configuration fragment redis-common.conf with common settings (bind, port, persistence, memory limits, security).

Configure the master ( 6379) without replicaof. Enable replica-read-only yes and replica-serve-stale-data yes.

Configure each slave ( 6380, 6381) with replicaof 127.0.0.1 6379 and the same security settings.

Write Sentinel configuration files ( sentinel.conf) for each instance, e.g.:

bind 127.0.0.1
port 26379
daemonize yes
logfile /var/log/redis/sentinel.log
dir /var/lib/redis
sentinel monitor redis-ha 127.0.0.1 6379 2
sentinel auth-pass redis-ha <PASSWORD>
sentinel down-after-milliseconds redis-ha 5000
sentinel parallel-syncs redis-ha 1
sentinel failover-timeout redis-ha 60000

Start all Redis instances and Sentinel processes. Verify with redis-cli INFO replication and redis-cli -p 26379 SENTINEL get-master-addr-by-name redis-ha.

Client Integration

Never hard‑code Sentinel IP/port for writes. Use the Sentinel API to discover the current master. Example snippets:

# Java (Lettuce)
RedisClient client = RedisClient.create();
StatefulRedisSentinelConnection<String, String> sentinelConn = client.connectSentinel(
    RedisURI.builder()
        .withSentinel("10.20.0.11", 26379)
        .withSentinel("10.20.0.12", 26380)
        .withSentinel("10.20.0.13", 26381)
        .withMasterId("redis-ha")
        .withPassword("<PASSWORD>".toCharArray())
        .build());
RedisCommands<String, String> cmd = client.connect().sync();
cmd.set("key", "value");

# Python (redis‑py)
from redis.sentinel import Sentinel
sentinel = Sentinel([('10.20.0.11', 26379), ('10.20.0.12', 26380)], password='<PASSWORD>')
master = sentinel.master_for('redis-ha')
master.set('foo', 'bar')

# Go (go‑redis v9)
client := redis.NewFailoverClient(&redis.FailoverOptions{
    MasterName:    "redis-ha",
    SentinelAddrs: []string{"10.20.0.11:26379", "10.20.0.12:26380", "10.20.0.13:26381"},
    Password:      "<PASSWORD>",
})
client.Set(ctx, "foo", "bar", 0)

All three libraries automatically reconnect to the new master after a failover.

Monitoring & Alerting

Deploy redis_exporter on each node and scrape it with Prometheus. Essential metrics include: redis_up – instance health. redis_connected_slaves – number of online replicas. redis_master_repl_offset and redis_slave_repl_offset – replication lag. used_memory and mem_fragmentation_ratio – memory usage. rejected_connections – maxclients exhaustion.

Sample Alertmanager rules (simplified):

- alert: RedisDown
  expr: redis_up == 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Redis instance {{ $labels.instance }} is down"

- alert: RedisReplicationLag
  expr: (redis_master_repl_offset - redis_slave_repl_offset) > 1048576
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Replication lag exceeds 1 MiB on {{ $labels.instance }}"

Grafana dashboards (e.g., ID 11835) visualize these metrics.

Troubleshooting Scenarios

Slave disconnect – check master_link_status and master_link_down_since_seconds. Increase repl-backlog-size (≥1 GiB) and verify network/firewall rules.

Sentinel cannot elect a new master – ensure at least three sentinels, correct quorum (N/2 + 1), and that all sentinels can ping each other (open port 26379). Use SENTINEL reset redis-ha to clear stale state.

Data loss after split‑brain – enable min-replicas-to-write 1 and min-replicas-max-lag 10 so the master rejects writes when it cannot confirm a healthy replica.

Clients still connected to old master after failover – use client libraries that support Sentinel‑driven reconnection (Lettuce, go‑redis, redis‑py). For custom clients, listen for MASTERDOWN errors and re‑resolve the master address.

All cases include concrete redis-cli commands to inspect INFO replication, SENTINEL masters, and logs.

Performance & Capacity Planning

Run redis-benchmark to obtain baseline QPS. In production, provision only 25‑50 % of the measured maximum to leave headroom for latency spikes and background tasks.

Memory sizing steps:

Estimate data growth (current size × 1.5 × 1.5 for safety).

Reserve 30 % of RAM for repl-backlog, client buffers, and lazy‑free overhead.

Set maxmemory to ≤ 80 % of physical RAM (e.g., 50 GiB on a 64 GiB machine).

Network bandwidth: each client at 100 bytes per command and 10 K QPS consumes ~1 MiB/s. Ensure NICs of at least 10 Gbps and separate VLANs for master‑slave traffic.

Security Hardening

Use ACLs instead of the legacy requirepass:

# Create a read‑only user
ACL SETUSER readonly on >readonlypass ~* &* +@read
# Create a write user limited to the application keyspace
ACL SETUSER appuser on >apppass ~app:* &* +@read +@write -@dangerous
# Disable the default user
ACL SETUSER default off

Network hardening:

Bind Redis to internal IPs only (e.g., bind 10.20.0.11).

Use firewall rules to allow traffic only from trusted subnets.

Optionally enable TLS (port 6380) with a self‑signed certificate.

Rename or disable dangerous commands ( FLUSHALL, CONFIG, DEBUG, SHUTDOWN, KEYS).

Upgrade & Rollback Procedures

Redis 7 introduces ACLs, RESP3, and multi‑threaded I/O. Upgrade path:

Upgrade to Redis 6.x first (preserves RDB compatibility).

Migrate requirepass to ACL users.

Upgrade to Redis 7.x, test INFO output, and verify that Sentinel still discovers the master.

Rollback steps (e.g., from 7.0.5 to 7.0.4): stop the affected instance, restore data directory from backup, start the older binary, and run SENTINEL reset redis-ha so sentinels rediscover the node.

Operational Runbook & Checklist

Typical runbook actions (monthly):

Announce maintenance window.

Backup redis.conf and data directory.

Execute redis-cli -p 26379 SENTINEL failover redis-ha to test automatic promotion.

Verify new master with SENTINEL get-master-addr-by-name and check INFO replication on all slaves.

Monitor redis_up and redis_connected_slaves for at least 5 minutes.

Document any deviations and update the checklist.

Pre‑deployment checklist (excerpt):

- repl-backlog-size >= 1GiB
- min-replicas-to-write 1 & min-replicas-max-lag 10
- requirepass and masterauth are identical
- Deploy 3 or 5 Sentinel nodes with quorum = N/2 + 1
- Clients use Lettuce / go‑redis v9 / redis‑py (Sentinel aware)
- Prometheus + redis_exporter and Sentinel metrics are scraped
- Alert thresholds tuned to business SLAs
- Perform at least one manual failover test
- Verify firewall allows 6379 (Redis) and 26379 (Sentinel) traffic only between trusted hosts
- Disable transparent_hugepage and set vm.overcommit_memory=1
- Enable AOF everysec and ensure disk has 2× free space

Following this guide ensures a robust, observable, and secure Redis HA deployment ready for production workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.