Databases 31 min read

Hidden Redis Pitfalls That Can Crash Your System – How to Spot and Avoid Them

This article reveals the most common Redis pitfalls—including unexpected key expiration, command‑induced blocking, data‑persistence hazards, and replication inconsistencies—explains why they happen, and provides concrete steps, code snippets and configuration tips to prevent performance degradation or data loss.

ITPUB

Apr 12, 2021

Hidden Redis Pitfalls That Can Crash Your System – How to Spot and Avoid Them

Introduction

Redis is widely used as an in‑memory cache and data store, but many subtle behaviours can lead to severe performance problems or data loss. The article walks through the most frequent "gotchas" across three areas: command usage, data persistence, and master‑slave replication, and shows how to avoid them.

Common Command Pitfalls

1) Expiration disappears after a plain SET

When a key is set with an expiration (e.g., SET testkey val1 EX 60) the TTL is stored. If the key is later overwritten without the expiration flag ( SET testkey val2), Redis automatically removes the TTL, turning the key into a permanent entry. This can cause memory bloat as many keys that should expire remain forever.

127.0.0.1:6379> SET testkey val1 EX 60
OK
127.0.0.1:6379> TTL testkey
(integer) 59

127.0.0.1:6379> SET testkey val2
OK
127.0.0.1:6379> TTL testkey   // now -1, never expires
(integer) -1

2) DEL can block the server

The time complexity of DEL depends on the key type. Deleting a large List, Hash, Set or ZSet requires freeing each element, which may take seconds and block the event loop. The article recommends checking the element count first ( LLEN, HLEN, SCARD, ZCARD) and, for large collections, deleting in batches using LRANGE / LPOP, HSCAN / HDEL, SSCAN / SREM, or ZSCAN / ZREM.

Query element count.

If small, delete directly; otherwise batch delete.

Use scan‑and‑pop commands to free memory gradually.

3) RANDOMKEY may block Redis

RANDOMKEY first checks whether the randomly selected key is expired. If many keys are expired but not yet cleaned, the master (or slave) may loop many times, each iteration performing a lazy‑expiration step, which can dramatically increase latency. On slaves the problem is worse because slaves do not actively delete expired keys, potentially leading to an infinite loop.

Master picks a random key and checks expiration.

If expired, it is deleted and the loop continues.

When a non‑expired key is finally found, it is returned.

Redis 5.0 introduced a limit of 100 attempts on slaves to avoid the dead‑loop.

4) SETBIT O(1) can cause OOM

Using SETBIT on a non‑existent key with a very large offset forces Redis to allocate a huge bitmap, which may exhaust memory. The same applies to bigkeys stored as strings; deleting them also takes long because the memory must be released.

127.0.0.1:6379> SETBIT testkey 10 1
(integer) 1
127.0.0.1:6379> GETBIT testkey 10
(integer) 1

5) MONITOR can trigger OOM

When the server is under high QPS, the output buffer used by MONITOR grows continuously. If the instance does not have enough RAM, the buffer can cause an out‑of‑memory (OOM) crash. Use MONITOR only in low‑traffic environments.

Data Persistence Pitfalls

Redis supports RDB snapshots and AOF logs. Both mechanisms have hidden costs.

1) Master crash without persistence leads to total data loss

If a master is configured without RDB/AOF and a supervisor restarts it after a crash, the new instance starts empty. The slave, following the master’s state, also clears its data, causing a complete cache loss and a potential cache‑snowball effect on the backend database.

Recommended steps:

Do not let a process manager automatically restart a Redis instance that has no persistence.

When a master fails, let Sentinel promote a slave before the master restarts.

After promotion, restart the old master as a slave.

2) AOF everysec does not guarantee only 1‑second data loss

Redis writes to the AOF page cache immediately, but a background thread performs fsync every second. If the background fsync is blocked by heavy disk I/O, the main thread will skip writing to the page cache for up to 2 seconds, meaning up to 2 seconds of data can be lost on crash.

3) RDB/AOF rewrite may cause OOM

During snapshot or AOF rewrite Redis forks a child process. The parent continues to serve writes using copy‑on‑write (COW). High write rates cause many memory pages to be duplicated, quickly exhausting RAM and triggering OOM. This is especially problematic for write‑heavy workloads.

Replication Pitfalls

Redis replication is asynchronous, which introduces several consistency issues.

1) Data loss on master failure

If the master crashes before pending writes are replicated, those writes are lost. For cache‑only use this may be acceptable, but for use‑as‑database or distributed locks it can cause serious errors.

2) Expired‑key visibility differences between master and slave

In Redis ≤ 3.2, slaves returned the value of an expired key because they did not check expiration on reads. The bug was partially fixed in 3.2‑4.0.11, but the EXISTS command still returned true for expired keys until 4.0.11. From 4.0.11 onward all commands correctly treat expired keys as non‑existent.

3) Clock skew between master and slave

Expiration is evaluated using each instance’s local clock. If the slave’s clock runs faster, it may consider keys expired earlier than the master, leading to inconsistent query results and, after a failover, a massive purge that can cause a cache‑snowball.

4) Maxmemory mismatch

If master and slave have different maxmemory limits, the slave may start evicting keys earlier, causing data divergence. The article advises adjusting maxmemory on the slave first when increasing the limit, and on the master first when decreasing it.

Redis 5.0 introduced replica-ignore-maxmemory (default yes) so that slaves no longer evict keys on their own, keeping them in sync with the master.

5) Slave memory leak in writable replicas (Redis < 4.0)

When a writable slave receives keys with expiration, those keys are not automatically freed after expiration, leading to a leak. The bug was fixed in Redis 4.0.

6) Replication storm (full‑sync loops)

Large datasets, small client-output-buffer-limit, and high write traffic can cause the master’s replication buffer to overflow while the slave is still loading the RDB file. The master then disconnects the slave, which retries full sync, creating a loop that wastes resources. Mitigation includes keeping the dataset size reasonable and increasing the replication buffer limits.

Conclusion

The article catalogues dozens of Redis “gotchas” across command usage, persistence, and replication. Understanding the underlying mechanisms—TTL handling, command complexity, lazy expiration, copy‑on‑write during forks, version‑specific bugs, and configuration nuances—allows operators to pre‑empt performance bottlenecks, avoid data loss, and keep master‑slave clusters consistent.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Performance Caching replication Data persistence pitfalls

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.