Iterating and Cleaning Redis Dictionaries with the SCAN Command: Theory, Practice, and Pitfalls
This article explains how to use Redis's SCAN command to iterate over dictionary data, clean up stale keys, understand the command's underlying mechanics during dictionary resizing, and avoid common pitfalls through practical code examples and performance analysis.
Through the Redis SCAN command we can iterate over dictionary data and process the retrieved entries; the article also introduces several dictionary states (after expansion, after contraction, during rehashing) and investigates how SCAN guarantees data integrity when the dictionary changes.
Problem background : In modern logistics operations a massive amount of operational data (merchants, fleets, sites, sorting centers, customers, etc.) is generated, which directly supports the business flow. Basic CRUD capabilities are essential, and caching is widely used to improve read performance. The article showcases a cache‑size reduction practice performed on two systems (merchant basic data and C‑backend), achieving dramatic reductions in Redis memory usage.
Solution overview : The original cache used the @Cache annotation, which stored values in both local cache and JimDB without a default expiration time, leading to many zombie keys. The goal is to locate and delete these keys.
2.1 Keys command : Using KEYS to list all keys is blocking and O(N), which is unacceptable for large datasets (tens of gigabytes).
2.2 Scan command : Introduced in Redis 2.8, SCAN offers non‑blocking incremental iteration. Advantages: (1) Time complexity is still O(N) but performed in small batches, avoiding thread blockage; (2) Supports a COUNT parameter similar to SQL LIMIT to control batch size. Disadvantages: (1) Results may contain duplicates; (2) Keys added or removed during iteration may be missed or returned inconsistently.
2.3 Basic syntax :
SCAN cursor [MATCH pattern] [COUNT count]where cursor is the iteration cursor, pattern filters keys, and count suggests the number of elements to return per call (default 10).
2.4 Practice : Incremental iteration using cursor:
127.0.0.1:6379> scan 0 match * count 5
1) "14"
2) 1) "key6"
2) "key8"
3) "key9"
4) "key3"
5) "key5"
127.0.0.1:6379> scan 14 match * count 5
1) "15"
2) 1) "key4"
2) "key2"
3) "key1"
4) "key7"
5) "key10"
127.0.0.1:6379> scan 15 match * count 5
1) "0"
2) (empty list or set)After matching a key prefix, the loop deletes keys without an expiration time:
// Parameter setup
ScanOptions options = ScanOptions.scanOptions().match(String.format("%s*", keyPrefix)).count(10000).build();
KeyScanResult
scanResult = jimClient.scan(null, options);
while (CollectionUtils.isNotEmpty(scanResult.getResult())) {
for (String key : scanResult.getResult()) {
try {
Long ttl = jimClient.ttl(key);
if (ttl < 0) {
jimClient.del(key);
logger.info("Cleaned redis keyPrefix={}, key={}, ttl={}, deletedCount={}", keyPrefix, key, ttl, succ++);
}
} catch (Exception e) {
logger.error("Redis cleanup error: keyPrefix={}, key={}, failCount={}", keyPrefix, key, fail++, e);
}
}
scanResult = jimClient.scan(scanResult.getCursor(), options);
}2.5 Pitfall guide : When the result set becomes empty but the cursor has not finished, it is because SCAN iterates over hash slots, not over a static key list. Unlike SQL pagination, SCAN may return no matching keys in a batch even though more keys exist later. The correct way to detect completion is to use scanResult.isFinished() instead of checking for an empty result.
Modified loop example:
ScanOptions options = ScanOptions.scanOptions().match(String.format("%s*", keyPrefix)).count(10000).build();
KeyScanResult
scanResult = jimClient.scan(null, options);
while (!scanResult.isFinished()) {
if (CollectionUtils.isNotEmpty(scanResult.getResult())) {
for (String key : scanResult.getResult()) {
try {
Long ttl = jimClient.ttl(key);
if (ttl < 0) {
jimClient.del(key);
logger.info("Cleaned redis keyPrefix={}, key={}, ttl={}, deletedCount={}", keyPrefix, key, ttl, succ++);
}
} catch (Exception e) {
logger.error("Redis cleanup error: keyPrefix={}, key={}, failCount={}", keyPrefix, key, fail++, e);
}
}
}
scanResult = jimClient.scan(scanResult.getCursor(), options);
}3.1 Duplicate data : The article explains why SCAN may return duplicate keys when the dictionary is expanding or shrinking. During rehashing, buckets are moved between tables, and the high‑order‑bit iteration algorithm can cause already‑scanned buckets to be visited again, leading to duplicates.
3.2 Conclusion : When a dictionary shrinks, high‑order buckets merge into lower ones (e.g., buckets 6 and 14 become bucket 6). To avoid missing data, SCAN rescans the merged bucket, which results in duplicate entries. The trade‑off between avoiding data loss and eliminating duplicates is inherent to the algorithm.
Through this Redis cache‑size reduction practice, significant resource savings were achieved, and the deep dive into SCAN 's design provided valuable insights into Redis internals.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.