Databases 10 min read

How to Count 100 Million Redis Keys Efficiently Without Crashing the Cluster

This article explains why the KEYS * command is dangerous for large Redis deployments and presents several practical alternatives—including SCAN, multithreaded SCAN, cluster‑wide parallel scans, built‑in counters, and real‑time incremental counting—along with code samples, performance comparisons, and guidance on choosing the right solution.

IT Services Circle
IT Services Circle
IT Services Circle
How to Count 100 Million Redis Keys Efficiently Without Crashing the Cluster

Introduction

Many developers have faced a situation where a manager asks for the total number of keys in Redis and the naive use of KEYS * blocks the entire cluster, causing severe service outages.

Why KEYS * Is Not Recommended

Redis runs on a single‑threaded event loop, so KEYS * must scan the whole keyspace (O(N)). While scanning, no other commands are processed, leading to long pauses and possible OOM errors when the result set is huge.

Three fatal drawbacks:

Time complexity: Scanning 100 million keys can take >10 seconds even at 0.1 µs per key.

Memory storm: Returning millions of keys may exhaust client memory.

Cluster failure: In Cluster mode the command only sees keys on the local node.

Example error when the command runs out of memory:

127.0.0.1:6379> KEYS *
(error) OOM command not allowed when used memory > 'maxmemory'

Solution 1: SCAN Command

The SCAN command iterates with a cursor, returning a small batch of keys each time, thus avoiding blocking.

public long safeCount(Jedis jedis) {
    long total = 0;
    String cursor = "0";
    ScanParams params = new ScanParams().count(500); // batch size
    do {
        ScanResult<String> rs = jedis.scan(cursor, params);
        cursor = rs.getCursor();
        total += rs.getResult().size();
    } while (!"0".equals(cursor));
    return total;
}

Assuming each SCAN call takes ~3 ms and returns 500 keys, counting 100 million keys requires 200 000 calls, roughly 600 seconds (10 minutes).

Solution 2: Multithreaded Concurrent SCAN

On multi‑core servers, a thread pool can run many SCAN operations in parallel.

public long parallelCount(JedisPool pool, int threads) throws Exception {
    ExecutorService executor = Executors.newFixedThreadPool(threads);
    AtomicLong total = new AtomicLong(0);
    List<String> cursors = new ArrayList<>();
    for (int i = 0; i < threads; i++) {
        cursors.add(String.valueOf(i));
    }
    CountDownLatch latch = new CountDownLatch(threads);
    for (String cursor : cursors) {
        executor.execute(() -> {
            try (Jedis jedis = pool.getResource()) {
                String cur = cursor;
                do {
                    ScanResult<String> rs = jedis.scan(cur, new ScanParams().count(500));
                    cur = rs.getCursor();
                    total.addAndGet(rs.getResult().size());
                } while (!"0".equals(cur));
                latch.countDown();
            }
        });
    }
    latch.await();
    executor.shutdown();
    return total.get();
}

Performance test on a 32‑core CPU with 100 million keys:

Single‑thread SCAN: 580 s, CPU 5%.

32‑thread SCAN: 18 s, CPU 800%.

Solution 3: Distributed Divide‑and‑Conquer (Redis Cluster)

In a Redis Cluster each master node scans its own slot range. Results are aggregated to obtain the global count.

public long clusterCount(JedisCluster cluster) {
    Map<String, JedisPool> nodes = cluster.getClusterNodes();
    AtomicLong total = new AtomicLong(0);
    nodes.values().parallelStream().forEach(pool -> {
        try (Jedis jedis = pool.getResource()) {
            if (jedis.info("replication").contains("role:slave")) return;
            String cursor = "0";
            do {
                ScanResult<String> rs = jedis.scan(cursor, new ScanParams().count(500));
                total.addAndGet(rs.getResult().size());
                cursor = rs.getCursor();
            } while (!"0".equals(cursor));
        }
    });
    return total.get();
}

Solution 4: Millisecond‑Level Counting

Option 1 – Built‑in Counter

Use INFO keyspace to read the total key count (O(1)). It is fast but may include expired keys and cannot be filtered by pattern.

127.0.0.1:6379> INFO keyspace
# Keyspace
db0:keys=100000000,expires=20000,avg_ttl=3600

Option 2 – Real‑Time Incremental Counting

Subscribe to keyspace notifications and maintain a counter that increments on SET and decrements on DEL.

@Configuration
public class KeyCounterConfig {
    @Bean
    public RedisMessageListenerContainer container(RedisConnectionFactory factory) {
        RedisMessageListenerContainer container = new RedisMessageListenerContainer();
        container.setConnectionFactory(factory);
        container.addMessageListener((message, pattern) -> {
            String event = new String(message.getBody());
            if (event.startsWith("__keyevent@0__:set")) {
                redisTemplate.opsForValue().increment("total_keys", 1);
            } else if (event.startsWith("__keyevent@0__:del")) {
                redisTemplate.opsForValue().decrement("total_keys", 1);
            }
        }, new PatternTopic("__keyevent@*"));
        return container;
    }
}

Cost analysis:

Memory overhead: extra counter key.

CPU overhead: +5‑10% for processing notifications.

Network overhead: cross‑node synchronization in cluster mode.

Choosing the Right Approach

A decision flowchart (image) helps select a method based on accuracy, latency, and resource constraints.

Decision flowchart
Decision flowchart

Complexity and accuracy summary: KEYS: O(N) time & space, exact. SCAN: O(N) time, O(1) space, exact.

Built‑in counter: O(1) time & space, inexact.

Incremental counting: O(1) time & space, exact.

Hardware guidelines:

CPU‑bound: threads = CPU cores × 1.5.

IO‑bound: threads = CPU cores × 3.

Memory limit: tune COUNT batch size.

Typical business scenarios:

E‑commerce real‑time dashboards – incremental counter + RedisTimeSeries.

Offline analytics – export SCAN results to Spark.

Security auditing – parallel SCAN across nodes.

Final Takeaways

✅ Use divide‑and‑conquer for precise large‑scale counts. ✅ Use incremental counters for real‑time queries. ✅ Use sampling for trend analysis. ❌ Avoid brute‑force KEYS * scans—they are self‑destructive.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceredisClusterSCANKey Counting
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.