How We Doubled QPS and Fixed High CPU Load in a Java Backend Service
This article details a two‑week effort to diagnose and resolve high CPU usage, server load, and circuit‑breaker issues in a Java backend, covering JVM thread analysis with jtop, Hystrix optimization, Spring data‑binding fixes, and the resulting performance gains.
Background
Recently our service hit a performance bottleneck because early urgent requirements ignored optimization, making technical debt painful later.
Even under low QPS the server load reached 10‑20, CPU usage over 60%, and during traffic spikes the interface reported many errors. Although we used Hystrix for circuit breaking, the service failed to recover quickly, making deployments risky.
After demand slowed, the leader set a two‑week goal to resolve the performance issues. We identified and fixed several bottlenecks, revised the circuit‑breaker strategy, and ultimately doubled the service’s QPS capacity and achieved stable circuit breaking under 3‑4× higher load, with rapid recovery.
Server High CPU and Load
The first problem was the overall service causing high server load and CPU usage.
Our service fetches a batch of data from storage or remote calls, then performs many transformations before returning. The long transformation pipeline keeps CPU usage above 50% even under normal load.
To inspect JVM thread resource usage we used jtop, a simple jar that prints JVM statistics, e.g. -stack n to show the top CPU‑consuming thread stacks.
Heap Memory: INIT=134217728 USED=230791968 COMMITED=450363392 MAX=1908932608
NonHeap Memory: INIT=2555904 USED=24834632 COMMITED=26411008 MAX=-1
GC PS Scavenge VALID [PS Eden Space, PS Survivor Space] GC=161 GCT=440
GC PS MarkSweep VALID [PS Eden Space, PS Survivor Space, PS Old Gen] GC=2 GCT=532
ClassLoading LOADED=3118 TOTAL_LOADED=3118 UNLOADED=0
Total threads: 608 CPU=2454 (106.88%) USER=2142 (93.30%)
NEW=0 RUNNABLE=6 BLOCKED=0 WAITING=2 TIMED_WAITING=600 TERMINATED=0
main TID=1 STATE=RUNNABLE CPU_TIME=2039 (88.79%) USER_TIME=1970 (85.79%) Allocted: 640318696
com.google.common.util.concurrent.RateLimiter.tryAcquire(RateLimiter.java:337)
io.zhenbianshu.TestFuturePool.main(TestFuturePool.java:23)
RMI TCP Connection(2)-127.0.0.1 TID=2555 STATE=RUNNABLE CPU_TIME=89 (3.89%) USER_TIME=85 (3.70%) Allocted: 7943616
sun.management.ThreadImpl.dumpThreads0(Native Method)
sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:454)
me.hatter.tools.jtop.rmi.RmiServer.listThreadInfos(RmiServer.java:59)
me.hatter.tools.jtop.management.JTopImpl.listThreadInfos(JTopImpl.java:48)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
... ...By examining the thread stacks we located code points for optimization. We found many JSON serialization/deserialization and bean copying operations consuming CPU. Refactoring to reuse beans and replace JSON with protobuf dramatically reduced CPU pressure.
Circuit‑Breaker Framework Optimization
We originally used Hystrix, which is no longer maintained, but kept it because it fit our stack. The controller and inner RPC calls were annotated with Hystrix using thread‑pool isolation, 1000 ms timeout, 2000 max threads, and 200 ms RPC timeout with 500 threads.
Abnormal Response Times
Some requests took 1200‑2000 ms, exceeding the timeout. The issue could be in Hystrix, Spring, or the system layer. We generated flame graphs from jstack output and saw many threads blocked in LockSupport.park caused by HystrixTimer.addTimerListener.
Because the same RPC result was fetched 3‑5 times per request, we added a LocalCache and inadvertently placed the Hystrix annotation on the cache’s get method, causing 3000‑5000 Hystrix calls per request and a flood of timer listeners.
@HystrixCommand(
fallbackMethod = "fallBackGetXXXConfig",
commandProperties = {
@HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "200"),
@HystrixProperty(name = "circuitBreaker.errorThresholdPercentage", value = "50")
},
threadPoolProperties = {
@HystrixProperty(name = "coreSize", value = "200"),
@HystrixProperty(name = "maximumSize", value = "500"),
@HystrixProperty(name = "allowMaximumSizeToDivergeFromCoreSize", value = "true")
})
public XXXConfig getXXXConfig(Long uid) {
try {
return XXXConfigCache.get(uid);
} catch (Exception e) {
return EMPTY_XXX_CONFIG;
}
}Moving the Hystrix annotation to the cache’s load method and switching isolation to semaphore mode eliminated the timer‑listener bottleneck and reduced response times. However, semaphore isolation cannot interrupt already‑running methods, so excessive timeouts may still occupy permits.
Service Isolation and Degradation
We also improved Hystrix monitoring by adding hystrix-metrics-event-stream and the Hystrix dashboard, which gave a clear view of circuit‑breaker status.
With the optimizations, we could calculate a suitable semaphore limit: 2000*50/1000 = 100 permits for a target 50 ms average latency and 2000 QPS capacity.
Spring Data Binding Exception
During jstack analysis we observed threads stuck in Spring’s exception handling without any logs or visible errors. Spring silently catches exceptions during data binding.
at java.lang.Throwable.fillInStackTrace(Native Method)
...
org.springframework.beans.AbstractNestablePropertyAccessor.setPropertyValue(...)The controller method received many parameters (30‑40) and relied on Spring’s default binding, which attempts to set each property on an empty ApiContext instance, catching failures and continuing. This "try‑bind" loop caused significant performance loss.
@RequestMapping("test.json")
public Map testApi(@RequestParam(name = "id") String id, ApiContext apiContext) { ... }Implementing a custom HandlerMethodArgumentResolver for ApiContext eliminated the costly binding attempts and improved interface performance by roughly ten percent.
Conclusion
Performance optimization is an ongoing effort; postponing technical debt leads to painful fixes. Regular code reviews, awareness of hidden costs of third‑party tools, and continuous performance testing help keep services stable and efficient.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
