How a Redis Memory Upgrade Triggered Data Loss: Sentinel Failover Lessons
A recent Redis deployment faced memory expansion, a master‑slave switch, and unexpected data loss when the new master entered read‑only mode, prompting a deep dive into sentinel behavior, maxmemory settings, and replica‑ignore‑maxmemory nuances to prevent similar failures.
Background
A Redis service with a single‑master, single‑slave topology needed a memory upgrade. The upgrade required host restarts, and during a manual master‑slave switch the new master lost a large portion of data and entered a read‑only state for several minutes.
Redis High‑Availability Architecture
The cluster uses a Sentinel quorum (2n+1 nodes) to monitor the master and its replica. Sentinels poll each Redis instance; if a majority marks a node as subjectively down, the node is considered objectively down and the following actions are taken:
If the down node is the master, a healthy replica is promoted to master.
If the down node is a replica, it is removed from the replica set.
When an objectively down node recovers, a majority of Sentinels reintegrate it as a replica. Clients discover the current master by querying Sentinel instead of hard‑coding addresses.
Memory Expansion Procedure
Upgrade the memory of host B (initial replica) and restart the host.
Verify that the Redis instance on B resynchronizes with the master:
Check the Redis log for errors.
Run INFO KEYSPACE on both nodes and confirm identical key counts per DB.
Write a test key on the master and read it back from the replica.
Execute SENTINEL FAILOVER <em>mastername</em> to promote B to master and demote A to replica.
Upgrade the memory of host A (original master), restart it, and repeat the verification steps for the new master/replica pair.
If all checks succeed, the memory upgrade is considered complete.
Data Loss After Failover
Immediately after step 3, the key count dropped from several million to a few hundred thousand, and the new master became read‑only for about ten minutes. The root cause was an unexpectedly low maxmemory setting on the newly promoted master.
The replica B had been started with maxmemory 3GB because a previous CONFIG SET maxmemory change was not persisted to redis.conf. While operating as a replica, Redis ignored the limit (default behavior of replica-ignore-maxmemory), allowing it to hold >6 GB of data. After promotion, the limit became active, triggering eviction and read‑only mode.
Root‑Cause Analysis
The runtime CONFIG SET maxmemory change was not written back to the static configuration file; after a restart the old 3 GB limit was reloaded.
Since Redis 5.0, the replica-ignore-maxmemory option defaults to yes, causing replicas to ignore maxmemory and rely on the master for eviction.
When the replica was promoted, replica-ignore-maxmemory no longer applied, so the 3 GB limit caused immediate eviction of keys, resulting in data loss.
The eviction policy was volatile‑lru, which evicts only keys with an expiration time, leaving non‑expiring keys in memory even after the limit was exceeded.
Recommendations
When changing maxmemory at runtime, also persist the change to the configuration file (e.g., run CONFIG REWRITE) to keep both configurations synchronized.
Consider setting maxmemory 0 (no limit) if the workload permits, or explicitly set replica-ignore-maxmemory no on replicas that may be promoted.
Verify the eviction policy aligns with the desired data‑retention behavior; volatile‑lru will not evict persistent keys.
Understanding the distinction between volatile runtime settings and persisted configuration, as well as the effect of replica-ignore-maxmemory, is essential for safe Redis failover and memory upgrades.
References
https://redis.io/docs/management/replication/
https://www.cnblogs.com/buttercup/p/14051301.html
https://zhuanlan.zhihu.com/p/151740247
https://www.cnblogs.com/AcAc-t/p/redis_master_switch_failure.html
https://zhuanlan.zhihu.com/p/320651292
https://redis.io/commands/config-rewrite/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
