Databases 10 min read

How to Count 100 Million Redis Keys Efficiently Without Crashing the Cluster

This article explains why the KEYS * command is dangerous for large Redis deployments and presents several practical alternatives—including SCAN, multithreaded SCAN, cluster‑wide parallel scans, built‑in counters, and real‑time incremental counting—along with code samples, performance comparisons, and guidance on choosing the right solution.

IT Services Circle
IT Services Circle
IT Services Circle
How to Count 100 Million Redis Keys Efficiently Without Crashing the Cluster

Introduction

Many developers have faced a situation where a manager asks for the total number of keys in Redis and the naive use of

KEYS *

blocks the entire cluster, causing severe service outages.

Why KEYS * Is Not Recommended

Redis runs on a single‑threaded event loop, so

KEYS *

must scan the whole keyspace (O(N)). While scanning, no other commands are processed, leading to long pauses and possible OOM errors when the result set is huge.

Three fatal drawbacks:

Time complexity: Scanning 100 million keys can take >10 seconds even at 0.1 µs per key.

Memory storm: Returning millions of keys may exhaust client memory.

Cluster failure: In Cluster mode the command only sees keys on the local node.

Example error when the command runs out of memory:

<code>127.0.0.1:6379> KEYS *
(error) OOM command not allowed when used memory > 'maxmemory'</code>

Solution 1: SCAN Command

The

SCAN

command iterates with a cursor, returning a small batch of keys each time, thus avoiding blocking.

<code>public long safeCount(Jedis jedis) {
    long total = 0;
    String cursor = "0";
    ScanParams params = new ScanParams().count(500); // batch size
    do {
        ScanResult<String> rs = jedis.scan(cursor, params);
        cursor = rs.getCursor();
        total += rs.getResult().size();
    } while (!"0".equals(cursor));
    return total;
}</code>

Assuming each

SCAN

call takes ~3 ms and returns 500 keys, counting 100 million keys requires 200 000 calls, roughly 600 seconds (10 minutes).

Solution 2: Multithreaded Concurrent SCAN

On multi‑core servers, a thread pool can run many

SCAN

operations in parallel.

<code>public long parallelCount(JedisPool pool, int threads) throws Exception {
    ExecutorService executor = Executors.newFixedThreadPool(threads);
    AtomicLong total = new AtomicLong(0);
    List<String> cursors = new ArrayList<>();
    for (int i = 0; i < threads; i++) {
        cursors.add(String.valueOf(i));
    }
    CountDownLatch latch = new CountDownLatch(threads);
    for (String cursor : cursors) {
        executor.execute(() -> {
            try (Jedis jedis = pool.getResource()) {
                String cur = cursor;
                do {
                    ScanResult<String> rs = jedis.scan(cur, new ScanParams().count(500));
                    cur = rs.getCursor();
                    total.addAndGet(rs.getResult().size());
                } while (!"0".equals(cur));
                latch.countDown();
            }
        });
    }
    latch.await();
    executor.shutdown();
    return total.get();
}</code>

Performance test on a 32‑core CPU with 100 million keys:

Single‑thread

SCAN

: 580 s, CPU 5%.

32‑thread

SCAN

: 18 s, CPU 800%.

Solution 3: Distributed Divide‑and‑Conquer (Redis Cluster)

In a Redis Cluster each master node scans its own slot range. Results are aggregated to obtain the global count.

<code>public long clusterCount(JedisCluster cluster) {
    Map<String, JedisPool> nodes = cluster.getClusterNodes();
    AtomicLong total = new AtomicLong(0);
    nodes.values().parallelStream().forEach(pool -> {
        try (Jedis jedis = pool.getResource()) {
            if (jedis.info("replication").contains("role:slave")) return;
            String cursor = "0";
            do {
                ScanResult<String> rs = jedis.scan(cursor, new ScanParams().count(500));
                total.addAndGet(rs.getResult().size());
                cursor = rs.getCursor();
            } while (!"0".equals(cursor));
        }
    });
    return total.get();
}</code>

Solution 4: Millisecond‑Level Counting

Option 1 – Built‑in Counter

Use

INFO keyspace

to read the total key count (O(1)). It is fast but may include expired keys and cannot be filtered by pattern.

<code>127.0.0.1:6379> INFO keyspace
# Keyspace
db0:keys=100000000,expires=20000,avg_ttl=3600</code>

Option 2 – Real‑Time Incremental Counting

Subscribe to keyspace notifications and maintain a counter that increments on

SET

and decrements on

DEL

.

<code>@Configuration
public class KeyCounterConfig {
    @Bean
    public RedisMessageListenerContainer container(RedisConnectionFactory factory) {
        RedisMessageListenerContainer container = new RedisMessageListenerContainer();
        container.setConnectionFactory(factory);
        container.addMessageListener((message, pattern) -> {
            String event = new String(message.getBody());
            if (event.startsWith("__keyevent@0__:set")) {
                redisTemplate.opsForValue().increment("total_keys", 1);
            } else if (event.startsWith("__keyevent@0__:del")) {
                redisTemplate.opsForValue().decrement("total_keys", 1);
            }
        }, new PatternTopic("__keyevent@*"));
        return container;
    }
}</code>

Cost analysis:

Memory overhead: extra counter key.

CPU overhead: +5‑10% for processing notifications.

Network overhead: cross‑node synchronization in cluster mode.

Choosing the Right Approach

A decision flowchart (image) helps select a method based on accuracy, latency, and resource constraints.

Decision flowchart
Decision flowchart

Complexity and accuracy summary:

KEYS

: O(N) time & space, exact.

SCAN

: O(N) time, O(1) space, exact.

Built‑in counter: O(1) time & space, inexact.

Incremental counting: O(1) time & space, exact.

Hardware guidelines:

CPU‑bound: threads = CPU cores × 1.5.

IO‑bound: threads = CPU cores × 3.

Memory limit: tune

COUNT

batch size.

Typical business scenarios:

E‑commerce real‑time dashboards – incremental counter + RedisTimeSeries.

Offline analytics – export

SCAN

results to Spark.

Security auditing – parallel

SCAN

across nodes.

Final Takeaways

✅ Use divide‑and‑conquer for precise large‑scale counts. ✅ Use incremental counters for real‑time queries. ✅ Use sampling for trend analysis. ❌ Avoid brute‑force

KEYS *

scans—they are self‑destructive.

PerformanceredisClustermultithreadingSCANKey Counting
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.