Backend Development 5 min read

Investigation of JSF Thread‑Pool Exhaustion During R2M Redis Upgrade

During a Redis version upgrade of the internal R2M cache service, a Biz thread‑pool exhaustion error was traced to JSF threads blocked on a read lock held by a topology‑updater task, revealing a deadlock caused by shared ForkJoinPool usage and missing lock timeouts.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Investigation of JSF Thread‑Pool Exhaustion During R2M Redis Upgrade

Background: While upgrading the Redis version of R2M (a high‑performance, highly‑available distributed cache service used internally at JD), upstream services reported a RpcException: Biz thread pool of provider has been exhausted . Monitoring showed the issue on only one or two nodes, which were taken offline via the Jingdong Service Framework (JSF) to preserve the incident.

Log excerpt:

2024-03-13 02:21:20.188 [JSF-SEV-WORKER-57-T-5] ERROR BaseServerHandler - handlerRequest error msg:[JSF-23003] Biz thread pool of provider has been exhausted, the server port is 22003
2024-03-13 02:21:20.658 [JSF-SEV-WORKER-57-T-5] WARN BusinessPool - [JSF-23002] Task:com.alibaba.ttl.TtlRunnable - com.jd.jsf.gd.server.JSFTask@0 has been reject for ThreadPool exhausted! pool:80, active:80, queue:300, taskcnt: 1067777

Investigation steps: The problem was hypothesized to stem from a fixed‑size JSF thread pool; when all threads are busy and new traffic arrives, requests cannot be processed. Stack traces were captured using SGM and analyzed with the online ThreadDump Analyzer.

Thread analysis revealed that most JSF threads were stuck in JedisClusterInfoCache#getSlaveOfSlotFromDc , which acquires a read lock at the method entry.

Further inspection showed that the read lock has no timeout, and a scheduled topology‑update task acquires a write lock without guaranteed release. This write‑locked task blocks the read‑lock acquisition of other threads.

Additional findings indicated that parallelStream().forEach and Caffeine’s asynchronous refresh default to ForkJoinPool.commonPool . Because the topology‑updater also runs on this common pool, threads competing for the write lock can starve, leading to deadlock.

Verification: Three ForkJoinPool.commonPool‑worker threads were observed waiting on the read lock, while the local Caffeine cache lacked a custom thread pool, confirming the contention.

Root cause: The combination of a write‑locked topology updater and the shared common pool caused JSF threads to exhaust, resulting in the observed RpcException.

Takeaways: The issue appears only under specific load patterns; the asynchronous topology update was changed to synchronous; developers should be cautious when using shared thread pools and ensure proper lock timeouts; robust monitoring is essential for early detection and remediation.

BackendJavaRedisThreadPoolJSF
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.