Backend Development 15 min read

How We Fixed Dubbo Thread‑Pool Exhaustion by Tuning Redis Connection Pools

When a high‑traffic Dubbo interface began throwing thread‑pool exhaustion errors, the team traced the issue to Redis request spikes, identified mis‑configured connection‑pool parameters and version‑specific bugs, and applied a series of quick fixes, pool‑size adjustments, and client‑side optimizations that dramatically reduced latency and error rates.

ITPUB

Oct 29, 2021

How We Fixed Dubbo Thread‑Pool Exhaustion by Tuning Redis Connection Pools

Background

A Dubbo service handling 1.8 billion daily requests started to experience short‑lived circuit‑breaker trips, reporting that the provider's Dubbo thread pool was exhausted. The error volume reached 940 k requests per day, prompting an urgent performance investigation.

Rapid Emergency Response

Initial system monitoring (CPU, JVM memory, GC, threads) showed no anomalies aligned with the error timestamps. Traffic analysis revealed a sharp, periodic surge that coincided exactly with the error spikes.

Identifying the Bottleneck

Flow analysis showed the request flow: incoming request → downstream service via Hystrix (500 ms timeout) → cache lookup (local then Redis) → async DB fallback. The downstream service’s P99 latency spiked above 1 s during traffic peaks, but its average latency was under 10 ms, so the timeout settings were not the root cause.

Redis traffic was twice the overall request volume, indicating that the service was bypassing the local cache and hitting Redis directly. This mis‑behaviour caused Redis response time spikes that matched the Dubbo P99 spikes.

Further analysis of Redis connection metrics showed that during peak periods the number of active connections rose sharply, exhausting the connection pool.

Solution

1. Fix the cache‑lookup bug – ensure the code reads from the local cache before falling back to Redis. This reduced unnecessary Redis traffic.

2. Redis scaling – the team expanded the Redis cluster from 6 to 8 masters, but latency improvements were limited because the client‑side pool remained a bottleneck.

3. Client‑side connection‑pool tuning

Reviewed the commons-pool2 version (project used 2.6.2 while middleware docs referenced 2.4.2) and discovered that the newer version omitted a pre‑heat call.

Adjusted pool parameters: maxWaitMillis=200, minIdle, minEvictableIdleTimeMillis, and timeBetweenEvictionRunsMillis to ensure idle connections are evicted and recreated appropriately.

Enabled pool pre‑heat by configuring the eviction timer, which creates idle connections at startup.

Code snippets illustrating the relevant Jedis and commons-pool2 methods were examined and modified accordingly.

public String setex(final byte[] key, final int seconds, final byte[] value) { ... }

public T borrowObject(final long borrowMaxWaitMillis) throws Exception { ... }

After applying these changes, the Redis request volume halved, the maximum Redis response time spike was mitigated, and the Dubbo P99 latency returned to normal levels.

Conclusion

When facing online performance incidents, prioritize rapid business recovery through throttling, circuit‑breaking, and degradation strategies. Effective use of monitoring platforms accelerates root‑cause analysis. For Redis‑related latency, examine server load, code paths, and client‑side pool configuration. Properly tuned commons-pool2 parameters and pre‑heating the pool are essential for handling large traffic bursts.

backend Dubbo commons-pool2 performance-tuning connection-pool

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.