Databases 38 min read

Deploy Redis Sentinel for High Availability in 30 Minutes – Step‑by‑Step Guide

Learn how to set up Redis Sentinel for high‑availability caching, covering prerequisites, anti‑patterns, detailed configuration of master, replicas and Sentinel nodes, firewall rules, monitoring, failover testing, troubleshooting, performance tuning, backup, rollback and best practices—all achievable within a 30‑minute deployment.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Deploy Redis Sentinel for High Availability in 30 Minutes – Step‑by‑Step Guide

Redis Sentinel High‑Availability Deployment

This guide shows how to build a highly available Redis cache using Sentinel in about 30 minutes.

Applicable Scenarios & Prerequisites

Workloads with QPS > 10,000 that require HA (caching, session store, message queue).

OS: RHEL/CentOS 8.5+ or Ubuntu 20.04+ (kernel 4.18+).

Redis version: 6.2+ (7.0+ recommended for newer Sentinel features).

Resources: minimum 3 nodes × 2C4G (recommended 3 × 4C8G) with 10 GB SSD.

Network: ports 6379 (Redis) and 26379 (Sentinel) open, inter‑node latency < 5 ms.

Permissions: root or sudo, plus a dedicated redis system user.

Skills: basic Linux commands, Redis CLI, networking fundamentals.

Anti‑Pattern Warnings (When Not to Use)

Single‑node low‑traffic scenarios (QPS < 1,000).

Strong consistency requirements – Redis replication is asynchronous.

Existing Redis Cluster – it already provides HA and sharding.

Ultra‑low latency workloads – Sentinel failover takes 15‑30 s.

Managed cloud Redis services (AWS ElastiCache, Alibaba Cloud) – they already offer HA.

Alternative Solutions Comparison

Large‑scale sharding → Redis Cluster (automatic sharding, better scalability).

Cloud environments → Cloud‑provider managed Redis (no ops, SLA guarantees).

Strong consistency → Use a relational DB + cache (Redis is not suitable).

Single‑node sufficient → Simple master‑replica without Sentinel (simpler architecture).

Environment & Version Matrix

RHEL 9.3 / CentOS Stream 9 – Redis 6.2.14 (repo) / 7.2.4 (compiled).

Ubuntu 22.04 LTS – Redis 6.0.16 (apt) / 7.2.4 (compiled).

Minimum spec: 3 nodes × 2C4G / 10 GB SSD.

Recommended spec: 3 nodes × 4C8G / 50 GB SSD.

Network: Gigabit NIC, latency < 5 ms.

Version Differences

Redis 6.2 vs 7.0 – ACL support and better Sentinel performance in 7.0.

Redis 7.0 adds SENTINEL CONFIG command for dynamic Sentinel config.

Ubuntu apt packages are older; compile the latest stable version for production.

Architecture Diagram

Redis Sentinel architecture diagram
Redis Sentinel architecture diagram

Implementation Steps

Step 1: Environment Check & Redis Installation

Install Redis 6.2+ on three servers, verify version and network connectivity.

# Check OS version
cat /etc/redhat-release
uname -r
# Install Redis from official repo
yum install -y redis
# Or compile the latest version
yum install -y gcc make tcl
wget https://download.redis.io/redis-stable.tar.gz
tar -zxvf redis-stable.tar.gz && cd redis-stable
make && make install
# Create directories and set permissions
mkdir -p /etc/redis /var/lib/redis /var/log/redis
useradd -r -s /bin/false redis
chown -R redis:redis /var/lib/redis /var/log/redis

Step 2: Configure Redis Master

Edit /etc/redis/redis.conf on the master (192.168.1.10) with the following key settings:

# Network
bind 0.0.0.0
protected-mode no
port 6379
# Daemonize
daemonize yes
pidfile /var/run/redis_6379.pid
logfile /var/log/redis/redis.log
dir /var/lib/redis
# Security
requirepass YourStrongPassword123
masterauth YourStrongPassword123
# Persistence (RDB & AOF)
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec
# Replication protection
min-replicas-to-write 1
min-replicas-max-lag 10
# Memory policy
maxmemory 2gb
maxmemory-policy allkeys-lru

Step 3: Configure Redis Replicas

Copy the master config to each replica (192.168.1.11 and 192.168.1.12) and add:

replicaof 192.168.1.10 6379
masterauth YourStrongPassword123
replica-read-only yes

Step 4: Configure Sentinel

Create /etc/redis/sentinel.conf on all three nodes with:

# Basic settings
port 26379
daemonize yes
pidfile /var/run/redis-sentinel.pid
logfile /var/log/redis/sentinel.log
dir /var/lib/redis
# Monitor the master
sentinel monitor mymaster 192.168.1.10 6379 2
sentinel auth-pass mymaster YourStrongPassword123
sentinel down-after-milliseconds mymaster 5000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes

Step 5: Test Failover

Simulate master failure and verify automatic promotion:

# Observe current master
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster
# Kill the master process
ssh [email protected] "kill -9 $(pgrep redis-server)"
# Watch Sentinel logs for +sdown, +odown, +failover-end
# Verify new master
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster

Step 6: Configure Firewall

Open required ports on each OS:

# RHEL/CentOS (firewalld)
firewall-cmd --permanent --add-port=6379/tcp
firewall-cmd --permanent --add-port=26379/tcp
firewall-cmd --reload
# Ubuntu/Debian (ufw)
ufw allow 6379/tcp
ufw allow 26379/tcp
ufw reload

Core Mechanism

Monitoring – Sentinels ping master and replicas every second.

Subjective down (SDOWN) – a single Sentinel marks a node down after the configured timeout.

Objective down (ODOWN) – quorum Sentinels agree the node is down.

Leader election – Sentinels elect a leader to drive failover.

Failover – the leader promotes the best replica, updates other replicas, and rewrites Sentinel state.

Observability (Monitoring & Alerting)

Native Redis commands:

# Replication status
redis-cli -h 192.168.1.10 -a YourStrongPassword123 INFO replication
# Sentinel status
redis-cli -p 26379 SENTINEL masters
redis-cli -p 26379 SENTINEL slaves mymaster

Prometheus + Redis Exporter:

# Start exporter for each node
redis_exporter --redis.addr=192.168.1.10:6379 --redis.password=YourStrongPassword123 --web.listen-address=:9121 &
# Add to prometheus.yml
- job_name: 'redis'
  static_configs:
    - targets: ['192.168.1.10:9121','192.168.1.11:9121','192.168.1.12:9121']

Key alert rules (example):

# Redis instance down
- alert: RedisDown
  expr: up{job="redis"} == 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Redis instance down ({{ $labels.instance }})"
    description: "Redis instance {{ $labels.instance }} has been down for more than 1 minute."
# Replication lag too high
- alert: RedisReplicationLag
  expr: (redis_master_repl_offset - redis_slave_repl_offset) > 1000
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Redis replication lag ({{ $labels.instance }})"
    description: "Slave {{ $labels.instance }} lagging by {{ $value }} bytes."
# Memory usage > 90%
- alert: RedisMemoryHigh
  expr: (redis_memory_used_bytes / redis_memory_max_bytes * 100) > 90
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Redis memory usage high ({{ $labels.instance }})"
    description: "Memory usage at {{ $value }}% on {{ $labels.instance }}."
# Cache hit rate < 80%
- alert: RedisCacheHitRateLow
  expr: rate(redis_keyspace_hits_total[5m]) / (rate(redis_keyspace_hits_total[5m]) + rate(redis_keyspace_misses_total[5m])) * 100 < 80
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Redis cache hit rate low ({{ $labels.instance }})"
    description: "Hit rate {{ $value }}% on {{ $labels.instance }}."
# Too many client connections
- alert: RedisConnectionsHigh
  expr: redis_connected_clients > 8000
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Redis connections high ({{ $labels.instance }})"
    description: "{{ $labels.instance }} has {{ $value }} connections."

Common Faults & Troubleshooting

Replication broken – check network, password, master status with INFO replication.

Sentinel fails to failover – verify quorum, ensure at least three Sentinels are running.

Data inconsistency – monitor master_repl_offset, wait for sync or run REPLICAOF no one to resync.

Frequent failovers – increase down-after-milliseconds, reduce load spikes, improve network stability.

Write errors after failover – ensure new master has replica-read-only no and clients discover the new master via Sentinel.

High replica lag – check master write rate, upgrade slave hardware, increase repl-backlog-size.

Change & Rollback Playbook

Gray‑scale upgrade – upgrade one replica, verify, then promote it, finally upgrade the original master.

Rollback conditions – replication outage >5 min, stability issues, or >30% performance drop.

Rollback steps – stop all services, restore previous binaries and config files, start services, verify versions and replication.

Backup & Restore

#!/bin/bash
BACKUP_DIR="/backup/redis/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$BACKUP_DIR"
# Trigger RDB snapshot
redis-cli -h 192.168.1.10 -a YourStrongPassword123 BGSAVE
# Wait for snapshot
while [ $(redis-cli -h 192.168.1.10 -a YourStrongPassword123 LASTSAVE) -eq $LAST_SAVE ]; do sleep 1; done
# Copy files
cp /var/lib/redis/dump.rdb "$BACKUP_DIR/"
cp /var/lib/redis/appendonly.aof "$BACKUP_DIR/"
cp /etc/redis/redis.conf "$BACKUP_DIR/"
cp /etc/redis/sentinel.conf "$BACKUP_DIR/"
# Archive
tar -czf "$BACKUP_DIR.tar.gz" -C "$BACKUP_DIR" .
# Cleanup old backups
find /backup/redis -type f -mtime +7 -delete

Restore by stopping Redis, extracting the archive to /var/lib/redis, fixing ownership, and restarting the service.

Best Practices

Enable password authentication (requirepass & masterauth).

Use both AOF and RDB persistence.

Set min-replicas-to-write 1 and min-replicas-max-lag 10 to protect writes.

Deploy three Sentinels with quorum=2 (formula: floor(N/2)+1).

Monitor replication lag, cache hit rate, memory usage, and client connections.

Run quarterly failover drills (master kill, network partition, full replica loss).

Clients should discover the master via Sentinel libraries (e.g., redis‑py‑sentinel).

Avoid large keys (String < 10 MB, collections < 10 k elements) and use redis-cli --bigkeys to detect them.

Configure appropriate maxmemory-policy (allkeys‑lru for cache workloads).

Regularly clean expired keys and monitor expired_keys metric.

FAQ

Q: Difference between Sentinel and Redis Cluster? A: Sentinel provides HA for a single master‑replica set; Cluster adds sharding and HA for large datasets.

Q: Minimum number of Sentinels? A: Three – this gives a quorum of two and tolerates one Sentinel failure.

Q: Can failover cause data loss? A: Yes, because replication is asynchronous; up to a few seconds of writes may be lost.

Q: How to reduce data loss? A: Use min-replicas-to-write 1, enable AOF with appendfsync everysec, or use the WAIT command to require replica acknowledgment.

Q: Why does failover take 15‑30 s? A: 5 s for subjective down, 5‑10 s for quorum agreement, 5‑10 s for leader election and reconfiguration.

Q: Can Redis and Sentinel run on the same server? A: Technically yes, but not recommended for production because a single host failure removes both services.

Q: Does the original master become a replica automatically? A: Yes; to promote it back you must trigger a manual failover.

Q: How to monitor Sentinel health? A: Ping Sentinel ( redis-cli -p 26379 PING), check quorum with SENTINEL ckquorum, and scrape redis_sentinel_master_status via Prometheus.

Q: How do clients connect? A: Clients use Sentinel APIs (or library support) to discover the current master instead of hard‑coding an address.

Q: Can Sentinel monitor multiple masters? A: Yes – add multiple sentinel monitor entries, each representing an independent master‑replica set.

References

Redis Sentinel official docs: https://redis.io/docs/management/sentinel/

Redis replication docs: https://redis.io/docs/management/replication/

Redis persistence docs: https://redis.io/docs/management/persistence/

Sentinel source code: https://github.com/redis/redis/tree/unstable/src

High‑availability design articles: https://www.redisconf.com/

high availabilityRedisLinuxreplicationSentinelFailover
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.