Operations 11 min read

How I Built an Automated Redis Sentinel to Seamlessly Handle Failover

A sysadmin narrates how he monitors four Redis nodes, detects master failure with PING, promotes a slave using SLAVEOF, reconfigures the remaining replicas, and ultimately automates the entire process with a custom Sentinel program and a multi‑node Sentinel cluster for high availability.

Liangxu Linux

May 27, 2021

How I Built an Automated Redis Sentinel to Seamlessly Handle Failover

I was tasked with watching a Redis deployment consisting of one master and three slaves. I first connected to each node using redis-cli -h 10.232.0.X -p 6379 and began sending the PING command every second to verify responsiveness.

When the master stopped replying, I promoted a chosen slave (e.g., 10.232.0.3:6379) to master with slaveof no one, then confirmed the role change via info. After the new master was confirmed, I pointed the remaining slaves to it using slaveof 10.232.0.3 6379. Finally, I turned the original master into a slave of the new master once it came back online with slaveof 10.232.0.3 6379.

To avoid repeating these manual steps, I wrote a "Sentinel" program that continuously monitors the four nodes, performs the above promotion logic automatically, and logs each action. An optimization I added was to query only the current master for slave information via info, which provides up‑to‑date replica status.

The selection algorithm for the new master works as follows:

Discard any node whose status is DISCONNECTED or DOWN .

Discard nodes whose last successful ping was more than 5 seconds ago.

Among the remaining candidates, compare the replication offset ; the node with the highest offset is closest to the master.

If offsets are equal, choose the node with the smallest unique identifier ( uid ).

This logic is implemented in the C function sentinelSelectSlave():

sentinelRedisInstance *sentinelSelectSlave() {
    // filter out bad nodes
    while((de = dictNext(di)) != NULL) {
        if (slave->flags & (DOWN||DISCONNECTED)) continue;
        if (mstime() - slave->last_avail_time > 5000) continue;
        if (slave->slave_priority == 0) continue;
        // additional checks …
    }
    // sort remaining nodes
    qsort(..., compareSlavesForPromotion);
    // return best candidate
    return instance[0];
}

int compareSlavesForPromotion(const void *a, const void *b) {
    if ((*sa)->slave_priority != (*sb)->slave_priority)
        return (*sa)->slave_priority - (*sb)->slave_priority;
    if ((*sa)->slave_repl_offset > (*sb)->slave_repl_offset) return -1;
    if ((*sa)->slave_repl_offset < (*sb)->slave_repl_offset) return 1;
    return strcasecmp(sa_runid, sb_runid);
}

To increase reliability, I deployed multiple Sentinel nodes forming a Sentinel cluster. As long as one Sentinel remains alive, the system can still detect failures. The cluster uses a majority‑based subjective‑down detection: if two or more Sentinels report the master as down, it is considered objectively down, triggering the promotion workflow. When multiple Sentinels could act, a leader must be elected to perform the failover. This election is handled by the Raft algorithm (details omitted), ensuring only one Sentinel carries out the promotion. The final system—called the Sentinel system or Sentinel cluster—automatically monitors node health, decides when the master is truly down, selects the best slave to promote, reconfigures the remaining replicas, and even brings the old master back as a slave, all without manual intervention. All code examples target Redis 3.0.0. For deeper understanding, I recommend reading Huang Jian‑hong’s *Redis Design and Implementation* and exploring the Redis source code (starting with redis‑1.0.0 for networking basics, then redis‑3.0.0 for master‑slave, cluster, and Sentinel features).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring automation Operations redis c++sentinel failover

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.