Investigation and Resolution of JSF Asynchronous Call Timeout Causing Interface Availability Degradation
This article documents the investigation of a JSF asynchronous call timeout that reduced interface availability, explains the async call mechanisms, analyzes the root cause in the callback thread pool, and presents short‑term and long‑term solutions to restore 100% availability.
This article records a problem where a JSF asynchronous call timeout caused a drop in interface availability, introduces the troubleshooting approach and the JSF async call flow, and analyzes the JSF source code based on version 1.7.5‑HOTFIX‑T6.
Background : The advertising delivery system is an I/O‑bound service with many external calls; synchronous calls lead to long latency and thread‑pool contention, prompting a switch to JSF’s asynchronous mode.
JSF supports three asynchronous invocation methods:
(1) ResponseFuture via RpcContext
asyncHelloService.sayHello("The ResponseFuture One");
ResponseFuture<Object> future1 = RpcContext.getContext().getFuture();
asyncHelloService.sayNoting("The ResponseFuture Two");
ResponseFuture<Object> future2 = RpcContext.getContext().getFuture();
try {
future1.get();
future2.get();
} catch (Throwable e) {
LOGGER.error("catch " + e.getClass().getCanonicalName() + " " + e.getMessage(), e);
}(2) CompletableFuture via RpcContext (supported from 1.7.5)
asyncHelloService.sayHello("The CompletableFuture One");
CompletableFuture<String> cf1 = RpcContext.getContext().getCompletableFuture();
asyncHelloService.sayNoting("The CompletableFuture Two");
CompletableFuture<String> cf2 = RpcContext.getContext().getCompletableFuture();
CompletableFuture<String> cf3 = RpcContext.getContext().asyncCall(() -> {
asyncHelloService.sayHello("The CompletableFuture Three");
});
try {
cf1.get();
cf2.get();
cf3.get();
} catch (Throwable e) {
LOGGER.error("catch " + e.getClass().getCanonicalName() + " " + e.getMessage(), e);
}(3) Interface returning CompletableFuture (supported from 1.7.5)
CompletableFuture<String> cf4 = asyncHelloService.sayHelloAsync("The CompletableFuture Fore");
cf4.whenComplete((res, err) -> {
if (err != null) {
LOGGER.error("interface async cf4 now complete error " + err.getClass().getCanonicalName() + " " + err.getMessage(), err);
} else {
LOGGER.info("interface async cf4 now complete : {}", res);
}
});
CompletableFuture<Void> cf5 = asyncHelloService.sayNotingAsync("The CompletableFuture Five");
try {
LOGGER.info("interface async cf1 now is : {}", cf4.get());
LOGGER.info("interface async cf2 now is : {}", cf5.get());
} catch (Throwable e) {
LOGGER.error("catch " + e.getClass().getCanonicalName() + " " + e.getMessage(), e);
}Problem Phenomenon : After converting to async, most interfaces showed reduced latency, but occasional availability drops were observed. Monitoring revealed sporadic timeouts despite the provider returning within milliseconds, suggesting an issue in the JSF async mechanism.
Investigation : By reading the JSF source, the async flow was traced: the client creates a JSFCompletableFuture , caches a MsgFuture (msgId → MsgFuture) and sends the request via Netty. The server processes the request, returns a response, and the client matches the msgId, then submits a task to the callback thread pool to invoke complete or completeExceptionally on the CompletableFuture.
The callback thread pool has a core size of 20, queue length 256, and max size 200. Logs showed that during timeout events the pool’s core threads were saturated and 71 tasks were queued, confirming that the thread‑pool exhaustion caused the observed timeouts.
Solution :
Short‑term: increase the callback thread pool core size from 20 to 200.
Long‑term: refactor code to avoid heavy operations inside thenApply and eliminate nested async calls within callbacks.
After applying the changes, monitoring indicated that interface availability remained at 100%.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.