Mastering Redis High Availability: Sentinel, VIP, and Cluster Strategies
This article explains why Redis high‑availability is essential, details the inner workings of Redis Sentinel, compares several HA architectures—including Sentinel with DNS or VIP, client‑side Sentinel, Keepalived/Haproxy, Redis Cluster, Twemproxy, and Codis—lists their pros and cons, and shares practical best‑practice recommendations for building reliable Redis deployments.
Introduction
On May 13, 2017, DBA Wen Guobing from 37 Interactive delivered a Redis technical talk at the APM conference in Guangzhou.
Redis is an open‑source, ANSI‑C written, network‑enabled, in‑memory (with optional persistence) key‑value store that supports multiple language APIs. With the rapid growth of internet data, high performance and high availability are critical, and Redis provides both.
Sentinel Principle
Before discussing Redis HA solutions, the principles of Redis Sentinel are introduced.
Sentinel discovers masters from configuration files and monitors them, obtaining slave information via INFO.
Sentinel sends a HELLO message to monitored instances every second, announcing its IP, port, and ID.
Sentinel subscribes to HELLO messages from other Sentinels to discover peers monitoring the same master.
Sentinel pings instances; if no reply within the configured down‑after‑milliseconds, the instance is considered down.
Failover is triggered only after a quorum of Sentinels authorizes it.
When a master fails, Sentinel selects a slave (based on priority, replication offset, then PID) and sends SLAVEOF NO ONE to promote it.
The elected Sentinel updates the configuration epoch and broadcasts the new master configuration to other Sentinels.
Steps 1‑3 constitute automatic discovery, step 4 is health checking, steps 5‑6 handle failover, and step 7 updates configuration.
Redis High‑Availability Architectures
Common Redis HA architectures are presented:
Redis Sentinel cluster + internal DNS + custom script
Redis Sentinel cluster + VIP + custom script
Client directly connects to Sentinel port (JedisSentinelPool for Java, custom wrapper for PHP)
Redis Sentinel cluster + Keepalived/Haproxy
Redis Master/Slave + Keepalived
Redis Cluster
Twemproxy
Codis
1. Sentinel + Internal DNS + Custom Script
Sentinel monitors masters; Web services resolve an internal DNS name that points to the current master. When the master fails, a custom script updates the DNS record to the new master.
Pros:
Second‑level failover (within 10 s)
Custom script provides controllable architecture
Transparent to applications
Cons:
Higher maintenance cost; at least three Sentinel nodes recommended
Dependency on DNS may introduce resolution latency
Brief service interruption during Sentinel failover
Not suitable for external access
2. Sentinel + VIP + Custom Script
Similar to the DNS solution, but a virtual IP (VIP) is moved to the new master by the custom script after a failover.
Pros:
Second‑level failover (within 5 s)
Custom script provides controllable architecture
Transparent to applications
Cons:
Higher maintenance cost; at least three Sentinel nodes recommended
VIP management adds complexity and risk of IP conflicts
Brief service interruption during Sentinel failover
3. Client Directly Connects to Sentinel Port
Clients first connect to a Sentinel port, obtain the current master address, and then communicate with the master. Java can use JedisSentinelPool; PHP can implement a wrapper on top of phpredis.
Pros:
Fast fault detection
Low DBA maintenance cost
Cons:
Requires client support for Sentinel
Both Sentinel and Redis nodes must be reachable
Invasive to applications
4. Sentinel + Keepalived/Haproxy
Sentinel handles master failover while Keepalived moves the VIP. Haproxy can provide load balancing.
Pros:
Second‑level failover
Transparent to applications
Cons:
Higher maintenance cost
Potential split‑brain scenarios
Brief service interruption during Sentinel failover
5. Master/Slave + Keepalived
Native Redis master/slave with VIP managed by Keepalived. A custom script handles the master switch.
Pros:
Second‑level failover
Transparent to applications
Simple deployment, low maintenance cost
Cons:
Requires scripting for failover
Potential split‑brain issues
6. Redis Cluster
Redis 3.0 introduced a peer‑to‑peer cluster that shards keys into 16 384 slots. Clients are redirected to the node owning the slot, and the cluster uses a gossip protocol for node discovery.
Pros:
All‑in‑one deployment saves resources
Better performance than proxy modes
Automatic failover and slot migration with data availability
Officially supported with regular updates
Cons:
Relatively new architecture with limited best‑practice guidance
Limited multi‑key operation support
Clients must cache routing tables
Node discovery and resharding are not fully automated
7. Twemproxy
Multiple identical Twemproxy instances forward client requests to the appropriate Redis node based on a hash algorithm.
Pros:
Simple development, almost transparent to applications
Mature and widely used solution
Cons:
Proxy adds latency and can become a performance bottleneck
Scaling Twemproxy can be difficult
Redis expansion is cumbersome
Twitter has deprecated this approach internally
8. Codis
Codis, open‑sourced by Wandoujia, uses ZooKeeper for routing metadata, a web UI for management, and a stateless proxy compatible with the Redis protocol. It builds on Redis 2.8 with slot support.
Pros:
Simple development, nearly transparent to applications
Better performance than Twemproxy
Graphical UI, easy scaling, convenient operations
Cons:
Proxy still impacts performance
Many components require more machines
Modified Redis code makes upstream sync difficult
Future focus shifting to reborndb
Best Practices
Deploy Sentinel clusters with at least five nodes.
Share a Sentinel cluster across large services, proxying all ports.
Allocate distinct Redis port ranges per business.
Implement custom scripts in Python for flexibility.
Scripts must check the current Sentinel role.
Pass parameters: .
Use paramiko for SSH to avoid repeated connections.
Disable UseDNS no and GSSAPIAuthentication no to speed up SSH.
Fork a separate process for WeChat or email alerts to avoid blocking the main process.
All automatic or manual failover actions should complete within 15 seconds.
Conclusion
The talk highlighted the necessity of Redis high availability, explained Sentinel internals, compared several HA architectures, and shared practical best‑practice recommendations, providing a solid reference for building reliable Redis services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
