Databases 19 min read

Mastering Redis High Availability: Sentinel, VIP, and Cluster Strategies

This article explains why Redis high‑availability is essential, details the inner workings of Redis Sentinel, compares several HA architectures—including Sentinel with DNS or VIP, client‑side Sentinel, Keepalived/Haproxy, Redis Cluster, Twemproxy, and Codis—lists their pros and cons, and shares practical best‑practice recommendations for building reliable Redis deployments.

21CTO
21CTO
21CTO
Mastering Redis High Availability: Sentinel, VIP, and Cluster Strategies

Introduction

On May 13, 2017, DBA Wen Guobing from 37 Interactive delivered a Redis technical talk at the APM conference in Guangzhou.

Redis is an open‑source, ANSI‑C written, network‑enabled, in‑memory (with optional persistence) key‑value store that supports multiple language APIs. With the rapid growth of internet data, high performance and high availability are critical, and Redis provides both.

Sentinel Principle

Before discussing Redis HA solutions, the principles of Redis Sentinel are introduced.

Sentinel discovers masters from configuration files and monitors them, obtaining slave information via INFO.

Sentinel sends a HELLO message to monitored instances every second, announcing its IP, port, and ID.

Sentinel subscribes to HELLO messages from other Sentinels to discover peers monitoring the same master.

Sentinel pings instances; if no reply within the configured down‑after‑milliseconds, the instance is considered down.

Failover is triggered only after a quorum of Sentinels authorizes it.

When a master fails, Sentinel selects a slave (based on priority, replication offset, then PID) and sends SLAVEOF NO ONE to promote it.

The elected Sentinel updates the configuration epoch and broadcasts the new master configuration to other Sentinels.

Steps 1‑3 constitute automatic discovery, step 4 is health checking, steps 5‑6 handle failover, and step 7 updates configuration.

Redis High‑Availability Architectures

Common Redis HA architectures are presented:

Redis Sentinel cluster + internal DNS + custom script

Redis Sentinel cluster + VIP + custom script

Client directly connects to Sentinel port (JedisSentinelPool for Java, custom wrapper for PHP)

Redis Sentinel cluster + Keepalived/Haproxy

Redis Master/Slave + Keepalived

Redis Cluster

Twemproxy

Codis

1. Sentinel + Internal DNS + Custom Script

Sentinel monitors masters; Web services resolve an internal DNS name that points to the current master. When the master fails, a custom script updates the DNS record to the new master.

Pros:

Second‑level failover (within 10 s)

Custom script provides controllable architecture

Transparent to applications

Cons:

Higher maintenance cost; at least three Sentinel nodes recommended

Dependency on DNS may introduce resolution latency

Brief service interruption during Sentinel failover

Not suitable for external access

2. Sentinel + VIP + Custom Script

Similar to the DNS solution, but a virtual IP (VIP) is moved to the new master by the custom script after a failover.

Pros:

Second‑level failover (within 5 s)

Custom script provides controllable architecture

Transparent to applications

Cons:

Higher maintenance cost; at least three Sentinel nodes recommended

VIP management adds complexity and risk of IP conflicts

Brief service interruption during Sentinel failover

3. Client Directly Connects to Sentinel Port

Clients first connect to a Sentinel port, obtain the current master address, and then communicate with the master. Java can use JedisSentinelPool; PHP can implement a wrapper on top of phpredis.

Pros:

Fast fault detection

Low DBA maintenance cost

Cons:

Requires client support for Sentinel

Both Sentinel and Redis nodes must be reachable

Invasive to applications

4. Sentinel + Keepalived/Haproxy

Sentinel handles master failover while Keepalived moves the VIP. Haproxy can provide load balancing.

Pros:

Second‑level failover

Transparent to applications

Cons:

Higher maintenance cost

Potential split‑brain scenarios

Brief service interruption during Sentinel failover

5. Master/Slave + Keepalived

Native Redis master/slave with VIP managed by Keepalived. A custom script handles the master switch.

Pros:

Second‑level failover

Transparent to applications

Simple deployment, low maintenance cost

Cons:

Requires scripting for failover

Potential split‑brain issues

6. Redis Cluster

Redis 3.0 introduced a peer‑to‑peer cluster that shards keys into 16 384 slots. Clients are redirected to the node owning the slot, and the cluster uses a gossip protocol for node discovery.

Pros:

All‑in‑one deployment saves resources

Better performance than proxy modes

Automatic failover and slot migration with data availability

Officially supported with regular updates

Cons:

Relatively new architecture with limited best‑practice guidance

Limited multi‑key operation support

Clients must cache routing tables

Node discovery and resharding are not fully automated

7. Twemproxy

Multiple identical Twemproxy instances forward client requests to the appropriate Redis node based on a hash algorithm.

Pros:

Simple development, almost transparent to applications

Mature and widely used solution

Cons:

Proxy adds latency and can become a performance bottleneck

Scaling Twemproxy can be difficult

Redis expansion is cumbersome

Twitter has deprecated this approach internally

8. Codis

Codis, open‑sourced by Wandoujia, uses ZooKeeper for routing metadata, a web UI for management, and a stateless proxy compatible with the Redis protocol. It builds on Redis 2.8 with slot support.

Pros:

Simple development, nearly transparent to applications

Better performance than Twemproxy

Graphical UI, easy scaling, convenient operations

Cons:

Proxy still impacts performance

Many components require more machines

Modified Redis code makes upstream sync difficult

Future focus shifting to reborndb

Best Practices

Deploy Sentinel clusters with at least five nodes.

Share a Sentinel cluster across large services, proxying all ports.

Allocate distinct Redis port ranges per business.

Implement custom scripts in Python for flexibility.

Scripts must check the current Sentinel role.

Pass parameters: .

Use paramiko for SSH to avoid repeated connections.

Disable UseDNS no and GSSAPIAuthentication no to speed up SSH.

Fork a separate process for WeChat or email alerts to avoid blocking the main process.

All automatic or manual failover actions should complete within 15 seconds.

Conclusion

The talk highlighted the necessity of Redis high availability, explained Sentinel internals, compared several HA architectures, and shared practical best‑practice recommendations, providing a solid reference for building reliable Redis services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databasehigh availabilityredissentinelCluster
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.