How We Doubled QPS and Fixed High CPU Load in a Java Backend Service

This article details a two‑week effort to diagnose and resolve high CPU usage, server load, and circuit‑breaker issues in a Java backend, covering JVM thread analysis with jtop, Hystrix optimization, Spring data‑binding fixes, and the resulting performance gains.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How We Doubled QPS and Fixed High CPU Load in a Java Backend Service

Background

Recently our service hit a performance bottleneck because early urgent requirements ignored optimization, making technical debt painful later.

Even under low QPS the server load reached 10‑20, CPU usage over 60%, and during traffic spikes the interface reported many errors. Although we used Hystrix for circuit breaking, the service failed to recover quickly, making deployments risky.

After demand slowed, the leader set a two‑week goal to resolve the performance issues. We identified and fixed several bottlenecks, revised the circuit‑breaker strategy, and ultimately doubled the service’s QPS capacity and achieved stable circuit breaking under 3‑4× higher load, with rapid recovery.

Server High CPU and Load

The first problem was the overall service causing high server load and CPU usage.

Our service fetches a batch of data from storage or remote calls, then performs many transformations before returning. The long transformation pipeline keeps CPU usage above 50% even under normal load.

To inspect JVM thread resource usage we used jtop, a simple jar that prints JVM statistics, e.g. -stack n to show the top CPU‑consuming thread stacks.

Heap Memory: INIT=134217728  USED=230791968  COMMITED=450363392  MAX=1908932608
NonHeap Memory: INIT=2555904  USED=24834632  COMMITED=26411008  MAX=-1
GC PS Scavenge  VALID  [PS Eden Space, PS Survivor Space]  GC=161  GCT=440
GC PS MarkSweep  VALID  [PS Eden Space, PS Survivor Space, PS Old Gen]  GC=2  GCT=532
ClassLoading LOADED=3118  TOTAL_LOADED=3118  UNLOADED=0
Total threads: 608  CPU=2454 (106.88%)  USER=2142 (93.30%)
NEW=0  RUNNABLE=6  BLOCKED=0  WAITING=2  TIMED_WAITING=600  TERMINATED=0
main  TID=1  STATE=RUNNABLE  CPU_TIME=2039 (88.79%)  USER_TIME=1970 (85.79%) Allocted: 640318696
    com.google.common.util.concurrent.RateLimiter.tryAcquire(RateLimiter.java:337)
    io.zhenbianshu.TestFuturePool.main(TestFuturePool.java:23)
RMI TCP Connection(2)-127.0.0.1  TID=2555  STATE=RUNNABLE  CPU_TIME=89 (3.89%)  USER_TIME=85 (3.70%) Allocted: 7943616
    sun.management.ThreadImpl.dumpThreads0(Native Method)
    sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:454)
    me.hatter.tools.jtop.rmi.RmiServer.listThreadInfos(RmiServer.java:59)
    me.hatter.tools.jtop.management.JTopImpl.listThreadInfos(JTopImpl.java:48)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    ... ...

By examining the thread stacks we located code points for optimization. We found many JSON serialization/deserialization and bean copying operations consuming CPU. Refactoring to reuse beans and replace JSON with protobuf dramatically reduced CPU pressure.

Circuit‑Breaker Framework Optimization

We originally used Hystrix, which is no longer maintained, but kept it because it fit our stack. The controller and inner RPC calls were annotated with Hystrix using thread‑pool isolation, 1000 ms timeout, 2000 max threads, and 200 ms RPC timeout with 500 threads.

Abnormal Response Times

Some requests took 1200‑2000 ms, exceeding the timeout. The issue could be in Hystrix, Spring, or the system layer. We generated flame graphs from jstack output and saw many threads blocked in LockSupport.park caused by HystrixTimer.addTimerListener.

Because the same RPC result was fetched 3‑5 times per request, we added a LocalCache and inadvertently placed the Hystrix annotation on the cache’s get method, causing 3000‑5000 Hystrix calls per request and a flood of timer listeners.

@HystrixCommand(
    fallbackMethod = "fallBackGetXXXConfig",
    commandProperties = {
        @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "200"),
        @HystrixProperty(name = "circuitBreaker.errorThresholdPercentage", value = "50")
    },
    threadPoolProperties = {
        @HystrixProperty(name = "coreSize", value = "200"),
        @HystrixProperty(name = "maximumSize", value = "500"),
        @HystrixProperty(name = "allowMaximumSizeToDivergeFromCoreSize", value = "true")
    })
public XXXConfig getXXXConfig(Long uid) {
    try {
        return XXXConfigCache.get(uid);
    } catch (Exception e) {
        return EMPTY_XXX_CONFIG;
    }
}

Moving the Hystrix annotation to the cache’s load method and switching isolation to semaphore mode eliminated the timer‑listener bottleneck and reduced response times. However, semaphore isolation cannot interrupt already‑running methods, so excessive timeouts may still occupy permits.

Service Isolation and Degradation

We also improved Hystrix monitoring by adding hystrix-metrics-event-stream and the Hystrix dashboard, which gave a clear view of circuit‑breaker status.

With the optimizations, we could calculate a suitable semaphore limit: 2000*50/1000 = 100 permits for a target 50 ms average latency and 2000 QPS capacity.

Spring Data Binding Exception

During jstack analysis we observed threads stuck in Spring’s exception handling without any logs or visible errors. Spring silently catches exceptions during data binding.

at java.lang.Throwable.fillInStackTrace(Native Method)
... 
org.springframework.beans.AbstractNestablePropertyAccessor.setPropertyValue(...)

The controller method received many parameters (30‑40) and relied on Spring’s default binding, which attempts to set each property on an empty ApiContext instance, catching failures and continuing. This "try‑bind" loop caused significant performance loss.

@RequestMapping("test.json")
public Map testApi(@RequestParam(name = "id") String id, ApiContext apiContext) { ... }

Implementing a custom HandlerMethodArgumentResolver for ApiContext eliminated the costly binding attempts and improved interface performance by roughly ten percent.

Conclusion

Performance optimization is an ongoing effort; postponing technical debt leads to painful fixes. Regular code reviews, awareness of hidden costs of third‑party tools, and continuous performance testing help keep services stable and efficient.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendJavaspringHystrix
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.