Backend Development 7 min read

Cache Big‑Key and Hot‑Key Issues: Case Study, Root‑Cause Analysis, and Mitigation Strategies

This article examines a real‑world incident where oversized and frequently accessed Redis cache keys caused cache penetration and network bandwidth saturation during a high‑traffic promotion, analyzes the underlying reasons, and presents concrete solutions and preventive measures for backend systems.

JD Tech

Jan 6, 2025

Cache Big‑Key and Hot‑Key Issues: Case Study, Root‑Cause Analysis, and Mitigation Strategies

In modern software architectures, caching is essential for performance, but misuse can lead to severe incidents, especially when dealing with large (big‑key) or frequently accessed (hot‑key) cache entries.

During a Double‑11 sales event, a system experienced a critical outage: a massive promotional activity generated an oversized cache entry, causing Redis call latency to spike, overall availability to drop from 100% to 20%, and a cascade of failures across core services.

The root causes were twofold: first, cache penetration—when the local JVM cache was empty, a flood of requests simultaneously queried Redis for the newly created key; second, network bandwidth bottleneck—each hot‑key was about 1.5 MB, exceeding the per‑shard bandwidth limit (200 Mbps, roughly 133 concurrent accesses), leading to Redis thread blockage and a cache avalanche.

Original cache‑lookup pseudocode:

ActivityCache present = activityLocalCache.getIfPresent(activityDetailCacheKey);
if (present != null) {
    ActivityCache activityCache = incentiveActivityPOConvert.copyActivityCache(present);
    return activityCache;
}
ActivityCache remoteCache = getCacheFromRedis(activityDetailCacheKey);
activityLocalCache.put(activityDetailCacheKey, remoteCache);
return remoteCache;

To address the issue, the team implemented several measures:

Big‑key governance: switched serialization from JSON to Protostuff, reducing object size from 1.5 MB to 0.5 MB.

Compression: applied gzip compression with a threshold, shrinking a 500 KB payload to 17 KB.

Cache back‑source optimization: added a thread lock when the local cache missed, limiting concurrent Redis fetches.

Monitoring and Redis configuration tuning: regularly observed network traffic and adjusted rate‑limit settings to keep Redis stable.

After remediation, the cache‑lookup logic became:

ActivityCache present = activityLocalCache.get(activityDetailCacheKey, key -> getCacheFromRedis(key));
if (present != null) {
    return present;
}

Additional binary‑cache handling code was introduced:

/**
 * 查询二进制缓存
 */
private ActivityCache getBinCacheFromJimdb(String activityDetailCacheBinKey) {
    List<byte[]> activityByteList = slaveCluster.hMget(activityDetailCacheBinKey.getBytes(), "stock".getBytes());
    if (activityByteList.get(0) != null && activityByteList.get(0).length > 0) {
        byte[] decompress = ByteCompressionUtil.decompress(activityByteList.get(0));
        ActivityCache activityCache = ProtostuffUtil.deserialize(decompress, ActivityCache.class);
        if (activityCache != null) {
            if (activityByteList.get(1) != null && activityByteList.get(1).length > 0) {
                activityCache.setAvailableStock(Integer.valueOf(new String(activityByteList.get(1))));
            }
            return activityCache;
        }
    }
    return null;
}

Preventive measures were also defined: consider cache strategy during design, conduct thorough performance and stress testing, and regularly optimize and upgrade the system to incorporate newer technologies.

In conclusion, big‑key and hot‑key pitfalls can trigger serious production incidents; proper cache design, size control, and monitoring are vital to maintain system stability and performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Backend Development Redis Cache Penetration Big Key Hot Key

Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.