Why RedisTemplate’s Millisecond Expiry Triggers Heavy Cluster Commands and How to Fix It

During a high‑traffic load test of an e‑commerce product detail page, excessive Redis cluster and PSETEX commands caused CPU spikes, and the investigation revealed that using millisecond‑level expiration in Spring Data Redis triggers frequent cluster node lookups, which can be resolved by switching to second‑level expirations.

Yanxuan Tech Team
Yanxuan Tech Team
Yanxuan Tech Team
Why RedisTemplate’s Millisecond Expiry Triggers Heavy Cluster Commands and How to Fix It

1. Background

During major shopping festivals such as 618 and Double‑11, load testing is used to verify system capacity and expose risks early. In a recent full‑link load test, the product detail page response time remained high, leading to overall system load concerns that needed urgent resolution.

2. Service Information

Spring Data Redis version: 1.8.4.RELEASE

3. Analysis Process

3.1 Problem Analysis

The high response time of the product detail page was traced to slow Redis operations. Monitoring showed a Redis cluster with many connections on a slave node, high CPU usage, and frequent master‑slave disconnections. DBA logs indicated a large number of CLUSTER commands, each taking over 30 ms, concentrated on a single node.

3.2 Monitoring Observation

Cluster command volume and latency were visualized, confirming excessive CLUSTER and PSETEX commands. Both command types showed similar trends, suggesting a strong correlation.

The CLUSTER commands were all directed to a single node (xx.xxx.xx.xxx:16379).

3.3 Code Analysis

Investigation of the application code showed that developers used redisTemplate.opsForValue().set(), which internally decides between SETEX and PSETEX based on the expiration time unit. When the timeout is specified in milliseconds, PSETEX is used, which triggers the frequent cluster node lookup.

The cache that stores cluster topology has a hard‑coded 100 ms expiration. When it expires, the client iterates over all nodes, sending CLUSTER NODES commands to refresh topology, causing the observed correlation between PSETEX and CLUSTER traffic.

4. Solution and Verification

4.1 Solution

Replace all cache‑set operations that use millisecond expiration with second‑level expiration, thereby forcing the use of SETEX instead of PSETEX and eliminating the frequent CLUSTER NODES calls.

4.2 Verification

After deploying the change, monitoring showed a clear downward trend in CPU usage and command latency. Subsequent load‑test results demonstrated that the product detail page MRT dropped from over 100 ms to around 66 ms.

Overall MRT of the product detail page decreased from >100 ms to 66 ms after the fix.

5. Summary and Reflection

5.1 Summary

The root cause was the use of millisecond‑level expiration, which made Spring Data Redis invoke PSETEX and trigger frequent cluster topology refreshes, dramatically impacting performance under load. Switching to second‑level expiration resolved the issue.

5.2 Reflection

Although PSETEX and SETEX differ only in time‑unit precision, Spring’s hidden logic creates two distinct execution paths, one of which incurs heavy CLUSTER traffic. API design should avoid such opaque behavior; clearer method separation would help developers prevent similar pitfalls.

Future improvements: define version‑specific best‑practice guides for Spring Data Redis and enforce rigorous code‑review processes to catch hidden performance traps.

6. Extras

6.1 Why do cluster commands always hit the same node?

The loop that iterates over all nodes starts from the first entry and returns on the first successful response, so the same node is repeatedly targeted. This bug was fixed in Spring Data Redis 2.1.3 (DATAREDIS‑890).

6.2 Why are CLUSTER NODES commands slow?

Redis must iterate over 16 384 slots for each node; the CPU work grows with the number of master nodes, making the command increasingly expensive at scale.

6.3 When to use PSETEX?

PSETEX is appropriate when millisecond‑level expiration precision is required, but in most caching scenarios second‑level precision is sufficient.

PSETEX works exactly like SETEX with the sole difference that the expire time is specified in milliseconds instead of seconds.
Clusterspring-data-redispsetex
Yanxuan Tech Team
Written by

Yanxuan Tech Team

NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.