Doubling QPS and Fixing Hystrix Bottlenecks in a Java Service

This article describes how a Java backend service suffering from high CPU load and unstable Hystrix circuit breaking was diagnosed and optimized—using jtop for JVM profiling, refactoring JSON/Bean handling, switching Hystrix isolation modes, improving logging, and tweaking circuit‑breaker settings—to double its QPS capacity and achieve rapid recovery after traffic spikes.

Programmer DD
Programmer DD
Programmer DD
Doubling QPS and Fixing Hystrix Bottlenecks in a Java Service

Background

Our service hit performance bottlenecks after a rapid rollout of features. Even under low QPS, server load reached 10‑20, CPU usage stayed above 60%, and during traffic peaks the interfaces frequently errored. Although we used Hystrix for circuit breaking, the service struggled to recover after a break, making deployments risky.

When demand finally eased, leadership gave us two weeks to eliminate the performance issues. During that period we identified several bottlenecks, revised the circuit‑breaker strategy, and ultimately doubled the service's QPS handling while ensuring stable circuit breaking under 3‑4× load and fast recovery when pressure subsided.

High CPU and Load on the Server

The service fetches a batch of data from storage or remote calls, then performs many transformations before returning. The long transformation pipeline caused CPU usage to stay above 50% even under normal load.

To inspect JVM thread resource usage we tried JMC, but found it cumbersome. Instead we used jtop, a lightweight JAR that prints JVM statistics for a given PID: java -jar jtop.jar [options] <pid> By default jtop prints the top 5 CPU‑consuming thread stacks using the -stack n option. Sample output:

Heap Memory: INIT=134217728  USED=230791968  COMMITED=450363392  MAX=1908932608<br/>NonHeap Memory: INIT=2555904  USED=24834632  COMMITED=26411008  MAX=-1<br/>...<br/>Total threads: 608  CPU=2454 (106.88%)  USER=2142 (93.30%)<br/>main  TID=1  STATE=RUNNABLE  CPU_TIME=2039 (88.79%)  USER_TIME=1970 (85.79%)<br/>    com.google.common.util.concurrent.RateLimiter.tryAcquire(RateLimiter.java:337)<br/>    io.zhenbianshu.TestFuturePool.main(TestFuturePool.java:23)<br/>...

Analyzing the thread stacks revealed many CPU‑intensive JSON serialization/deserialization and Bean copying operations. We optimized the code by increasing Bean reuse and replacing JSON with Protocol Buffers, which dramatically reduced CPU pressure.

Circuit‑Breaker Framework Optimization

We originally used Hystrix, which is now unmaintained, but kept it because it fit our stack. The controller and inner RPC calls were annotated with Hystrix, using thread‑pool isolation, a 1000 ms timeout, 2000 max threads for the outer layer, and 200 ms timeout with 500 max threads for the inner RPC.

Abnormal Response Times

Access logs showed requests taking 1200 ms or more, sometimes over 2000 ms. Since Hystrix runs business logic in an async thread, the main thread may return immediately on timeout, so the long latency likely originates from Hystrix, Spring, or the system.

We captured thread stacks with jstack and generated flame graphs. The flame graph (shown below) revealed many threads blocked in LockSupport.park caused by HystrixTimer.addTimerListener, which are created for each Hystrix command.

Our RPC calls were cached locally, causing the same Hystrix command to be invoked 3‑5 times per request. With a QPS of 1000, this resulted in 3000‑5000 Hystrix invocations, creating a flood of TimerListeners.

@HystrixCommand(
    fallbackMethod = "fallBackGetXXXConfig",
    commandProperties = {
        @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "200"),
        @HystrixProperty(name = "circuitBreaker.errorThresholdPercentage", value = "50")},
    threadPoolProperties = {
        @HystrixProperty(name = "coreSize", value = "200"),
        @HystrixProperty(name = "maximumSize", value = "500"),
        @HystrixProperty(name = "allowMaximumSizeToDivergeFromCoreSize", value = "true")})
public XXXConfig getXXXConfig(Long uid) {
    try {
        return XXXConfigCache.get(uid);
    } catch (Exception e) {
        return EMPTY_XXX_CONFIG;
    }
}

We moved the Hystrix annotation to the cache loader method and switched isolation mode to semaphore, eliminating the thread‑pool overhead and reducing response times. However, semaphore isolation cannot abort already‑running methods, so excessive concurrency can still cause timeouts.

Service Isolation and Degradation

Hystrix’s default visual monitoring was insufficient. By adding hystrix-metrics-event-stream to the service and running hystrix-dashboard on the client, we obtained a clear view of circuit‑breaker status (see screenshot).

With the optimizations, the maximum response time became controllable. We calculated a semaphore limit using the formula 2000*50/1000 = 100 (QPS × target latency ÷ 1000) to cap concurrent requests, adding redundancy if error rates rose.

High Load During Circuit Breaker Prevents Recovery

Under extreme load, internal monitoring tools became unreliable because they added overhead. Excessive error logging also increased I/O pressure. We reduced log volume by suppressing stack traces in error logs and customizing Spring’s ExceptionHandler, allowing the service to recover quickly once traffic subsided.

Spring Data Binding Exception

While inspecting jstack we noticed many threads stuck in Spring’s data‑binding code without any visible logs. Spring silently catches NotWritablePropertyException during parameter binding, which caused a performance hit because our controller received dozens of parameters.

at java.lang.Throwable.fillInStackTrace(Native Method)<br/>...<br/>org.springframework.beans.AbstractNestablePropertyAccessor.processLocalProperty(AbstractNestablePropertyAccessor.java:426)<br/>at org.springframework.beans.AbstractNestablePropertyAccessor.setPropertyValue(AbstractNestablePropertyAccessor.java:278)<br/>...<br/>at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:991)

The controller method looked like:

@RequestMapping("test.json")
public Map testApi(@RequestParam(name = "id") String id, ApiContext apiContext) { }

Without a custom HandlerMethodArgumentResolver, Spring creates an empty ApiContext and attempts to set each incoming parameter via reflection, catching any exceptions and continuing. This “try‑set” loop caused significant overhead. Implementing a dedicated argument resolver reduced the interface latency by roughly ten percent.

Conclusion

Performance optimization is an ongoing effort; piling up technical debt only makes future fixes harder. Regular code reviews, awareness of hidden costs in third‑party tools, and continuous performance testing are essential to keep a Java backend stable and scalable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaJVMspringHystrix
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.