How to Optimize a Redis Big Key Online Without Disrupting Existing Services (Interview Answer)

The article explains what a Redis big key is, why it harms performance, typical scenarios that generate big keys, and provides a step‑by‑step online optimization plan—including key sharding, dual‑write synchronization, progressive migration with HSCAN, gray‑scale traffic switch, non‑blocking deletion using UNLINK, and monitoring with rollback procedures—to answer the interview question confidently.

Tech Freedom Circle
Tech Freedom Circle
Tech Freedom Circle
How to Optimize a Redis Big Key Online Without Disrupting Existing Services (Interview Answer)

What Is a Big Key?

A “Big Key” refers to a Redis key whose value occupies a large amount of memory, essentially a large‑value problem. Large values increase serialization/deserialization time, can block the single‑threaded Redis core, and may cause memory overflow.

Typical Scenarios That Produce Big Keys

Storing a 5 MB string in a single key.

Using a ZSET with 10 000 members.

A hash with 1 000 fields whose total value size reaches 100 MB.

Improper data‑structure choice, such as keeping massive binary files in a string.

Neglecting garbage‑data cleanup, causing hash fields to grow indefinitely.

Business mis‑estimation leading to excessive members in a key.

Lists that store fan lists of celebrities or comment lists of hot news, where the element count can be huge.

Harms Caused by Big Keys

1. Request Blocking

Reading or writing a big key takes long time, blocking subsequent requests because Redis processes commands sequentially on a single thread.

2. Memory Growth

Large values increase used memory; if memory reaches the maxmemory limit, Redis may start evicting important keys or block writes.

3. Network Blocking

Transferring a big value consumes significant network bandwidth, potentially slowing other services on the same server.

4. Master‑Slave Synchronization Issues

Deleting a big key blocks the master for a long time, which can interrupt replication or trigger an unwanted failover.

Production‑Level Big‑Key Scanning

Combine SCAN (or HSCAN for hashes) with a scheduled task during low‑traffic periods to discover big keys and trigger alerts via email or DingTalk.

Online Optimization Steps

5.1 Step 1 – Split the Big Key

Assume the big key is user:info:all (a hash storing 1 M user profiles). Split it into 100 shards, e.g., user:info:0user:info:99, each holding roughly 10 k users.

redis.hset("user:info:all", uid.toString(), JSON.toJSONString(info));

Shard ID calculation:

shard_id = hash(uid) % 100

5.2 Step 2 – Dual‑Write Synchronization

In the application layer, write to the new shard and the old key simultaneously to keep data consistent during migration.

// Pseudo‑code for dual‑write
public void updateUserInfo(Long uid, UserInfo info) {
    int shardId = Math.abs(uid.hashCode() % 100);
    String newKey = "user:info:" + shardId;
    redis.hset(newKey, uid.toString(), JSON.toJSONString(info));
    // Compatibility write to old key
    String oldKey = "user:info:all";
    redis.hset(oldKey, uid.toString(), JSON.toJSONString(info));
}

5.3 Step 3 – Progressive Migration

Use a background task that scans the old hash with HSCAN (batch size 1 000) and writes fields to the appropriate shard using pipelines to reduce round‑trips.

import redis, time
r = redis.Redis(host='localhost', port=6379, db=0)
old_key = "user:info:all"
shard_count = 100
cursor = 0
while True:
    cursor, fields = r.hscan(old_key, cursor=cursor, count=1000)
    if not fields:
        break
    shard_data = {}
    for uid, info in fields.items():
        shard_id = hash(uid) % shard_count
        shard_key = f"user:info:{shard_id}"
        shard_data.setdefault(shard_key, []).append((uid, info))
    with r.pipeline() as pipe:
        for shard_key, kvs in shard_data.items():
            pipe.hset(shard_key, mapping=dict(kvs))
        pipe.execute()
    time.sleep(0.1)

Key points:

Replace HGETALL with HSCAN to avoid loading the entire hash into memory.

Do not delete fields from the old hash during migration to prevent write‑read conflicts.

5.4 Step 4 – Gray‑Scale Traffic Switch

After the new shards contain complete data (validated by comparing field counts), gradually shift read traffic to the new keys.

Sample 1 % of user IDs and verify that reads from new and old keys return identical results.

Initially route 10 % of reads to new shards via a configuration center; monitor INFO stats metrics such as keyspace_hits and latency as well as business error rates.

Increase the traffic proportion stepwise to 100 % once no anomalies are observed.

public UserInfo getUserInfo(Long uid) {
    int switchRatio = config.getInteger("user.info.switch.ratio", 100);
    if (ThreadLocalRandom.current().nextInt(100) < switchRatio) {
        int shardId = Math.abs(uid.hashCode() % 100);
        String info = redis.hget("user:info:" + shardId, uid.toString());
        if (info != null) return JSON.parseObject(info, UserInfo.class);
    }
    // Fallback to old key
    String info = redis.hget("user:info:all", uid.toString());
    return info != null ? JSON.parseObject(info, UserInfo.class) : null;
}

Fallback hierarchy:

If the new shard returns null, query the old key.

If the old key also fails, read from the database and asynchronously write the result back to the new shard.

Apply rate‑limiting (e.g., max 100 DB queries per second) to protect the database from a sudden surge.

5.5 Step 5 – Non‑Blocking Deletion of the Old Key

Once all traffic has switched, delete the old key using UNLINK (asynchronous) instead of DEL to avoid blocking the main thread.

# Asynchronous deletion
127.0.0.1:6379> UNLINK user:info:all
(integer) 1

If the old key is extremely large (e.g., >1 GB), delete its fields in batches with HSCAN + HDEL before finally calling UNLINK:

cursor = 0
while True:
    cursor, fields = r.hscan(old_key, cursor=cursor, count=1000)
    if not fields:
        break
    r.hdel(old_key, *fields.keys())
    time.sleep(0.1)
# Finally remove the empty key
r.unlink(old_key)

5.6 Monitoring and Rollback Plan

Redis metrics: latency, used_memory, expired_keys.

Business metrics: API response time, error rate (4xx/5xx), DB query volume.

If anomalies appear, immediately switch the read ratio back to 0 % via the configuration center and pause the migration task.

The core of big‑key optimization is “shard the key + dual‑write + non‑blocking progressive migration + gray‑scale rollout + non‑blocking deletion + rollback capability”. By performing the migration in the background and adding multi‑layer caching with graceful degradation, the impact on online traffic is minimized.

backenddata migrationPerformanceRedisOnline OptimizationBig Key
Tech Freedom Circle
Written by

Tech Freedom Circle

Crazy Maker Circle (Tech Freedom Architecture Circle): a community of tech enthusiasts, experts, and high‑performance fans. Many top‑level masters, architects, and hobbyists have achieved tech freedom; another wave of go‑getters are hustling hard toward tech freedom.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.