Backend Development 16 min read

Performance Optimization of a Java Backend Service: Reducing CPU Load, Improving Hystrix Circuit Breaking, and Fixing Spring Data Binding Issues

This article details a two‑week effort to diagnose and resolve high CPU usage, server load, Hystrix circuit‑breaker inefficiencies, and Spring data‑binding exceptions in a Java backend service, resulting in doubled QPS capacity, stable circuit breaking under heavy traffic, and significant performance gains.

Architecture Digest

Nov 25, 2021

Performance Optimization of a Java Backend Service: Reducing CPU Load, Improving Hystrix Circuit Breaking, and Fixing Spring Data Binding Issues

Background

Recently our service hit a performance bottleneck because early urgent requirements ignored optimization, making technical debt painful to repay.

Even under low QPS the server load reached 10‑20, CPU usage stayed above 60%, and during traffic spikes the interface reported many errors. Although we used Hystrix for circuit breaking, after tripping the service failed to recover quickly, making deployments risky.

After demand slowed, the leader set a two‑week goal to eliminate the performance issues. In the investigation we identified several bottlenecks, revised the circuit‑breaker strategy, and finally doubled the service QPS capacity and achieved stable circuit breaking under 3‑4× load, with fast recovery.

High CPU and Load on Server

We first tackled the overall high load and CPU caused by the service.

The service fetches data from storage or remote calls, performs many transformations, and returns the result. The long transformation pipeline keeps CPU usage around 50% even under normal load.

To inspect JVM thread resource usage we tried jmc but found it cumbersome, and switched to jtop, a simple jar that prints JVM statistics.

Running java -jar jtop.jar [options] prints the top CPU‑consuming thread stacks (default -stack n shows the five most expensive stacks).

Heap Memory: INIT=134217728  USED=230791968  COMMITED=450363392  MAX=1908932608
NonHeap Memory: INIT=2555904  USED=24834632  COMMITED=26411008  MAX=-1
GC PS Scavenge  VALID  [PS Eden Space, PS Survivor Space]  GC=161  GCT=440
GC PS MarkSweep  VALID  [PS Eden Space, PS Survivor Space, PS Old Gen]  GC=2  GCT=532
ClassLoading LOADED=3118  TOTAL_LOADED=3118  UNLOADED=0
Total threads: 608  CPU=2454 (106.88%)  USER=2142 (93.30%)
NEW=0  RUNNABLE=6  BLOCKED=0  WAITING=2  TIMED_WAITING=600  TERMINATED=0

main  TID=1  STATE=RUNNABLE  CPU_TIME=2039 (88.79%)  USER_TIME=1970 (85.79%) Allocted: 640318696
    com.google.common.util.concurrent.RateLimiter.tryAcquire(RateLimiter.java:337)
    io.zhenbianshu.TestFuturePool.main(TestFuturePool.java:23)

RMI TCP Connection(2)-127.0.0.1  TID=2555  STATE=RUNNABLE  CPU_TIME=89 (3.89%)  USER_TIME=85 (3.70%) Allocted: 7943616
    sun.management.ThreadImpl.dumpThreads0(Native Method)
    sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:454)
    me.hatter.tools.jtop.rmi.RmiServer.listThreadInfos(RmiServer.java:59)
    me.hatter.tools.jtop.management.JTopImpl.listThreadInfos(JTopImpl.java:48)
    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    ... ...

By examining the thread stacks we located the hot code paths.

We discovered many JSON serialization/deserialization and Bean copying operations consuming CPU; after refactoring to reuse Beans and replace JSON with Protocol Buffers, CPU pressure dropped significantly.

Circuit‑Breaker Framework Optimization

We originally used Hystrix, which is no longer maintained; alternatives like resilience4j and Sentinel exist, but we kept Hystrix due to the existing stack.

Hystrix annotations were placed on controller interfaces and inner RPC calls, both using thread‑pool isolation, with 1000 ms timeout and 2000 max threads for the outer interface, 200 ms timeout and 500 max threads for inner RPC.

Abnormal Response Time

We observed requests taking 1200‑2000 ms; because Hystrix executes business logic in an async thread, the main thread may return early on timeout, so the delay could be in Hystrix, Spring, or the system.

We captured thread stacks with jstack and generated flame graphs, revealing many threads blocked at LockSupport.park inside HystrixTimer.addTimerListener.

These TimerListeners are created for each HystrixCommand to handle async timeout; under high load they become a bottleneck.

We found that the same RPC result was fetched 3‑5 times per request and cached via a LocalCache; HystrixCommand was placed on the cache get method, causing 3000‑5000 Hystrix calls per 1000 QPS, inflating TimerListeners.

@HystrixCommand(
    fallbackMethod = "fallBackGetXXXConfig",
    commandProperties = {
        @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "200"),
        @HystrixProperty(name = "circuitBreaker.errorThresholdPercentage", value = "50")
    },
    threadPoolProperties = {
        @HystrixProperty(name = "coreSize", value = "200"),
        @HystrixProperty(name = "maximumSize", value = "500"),
        @HystrixProperty(name = "allowMaximumSizeToDivergeFromCoreSize", value = "true")
    })
public XXXConfig getXXXConfig(Long uid) {
    try {
        return XXXConfigCache.get(uid);
    } catch (Exception e) {
        return EMPTY_XXX_CONFIG;
    }
}

We moved the HystrixCommand to the cache load method and switched isolation to semaphore mode, which eliminated the thread‑pool overhead and reduced CPU usage.

Note that semaphore isolation cannot abort already‑executing methods, so excessive timeouts may still occupy semaphores.

Service Isolation and Degradation

Initially Hystrix’s degradation behavior was inconsistent; we introduced Hystrix’s metrics stream and dashboard to visualize circuit status.

By limiting concurrent calls (e.g., 2000 QPS * 50 ms / 1000 ms = 100 semaphore permits) and adding redundancy, we can reject excess traffic during spikes and rely on circuit breaking for further degradation.

Circuit‑Breaker Causing High Load After Recovery

When the circuit trips, the service load stays high; logging massive error stacks further stresses the system. We reduced log volume by suppressing exception stacks and customizing Spring’s ExceptionHandler, allowing the service to recover quickly after load drops.

Spring Data Binding Exception

During jstack analysis we saw threads stuck in Spring’s data‑binding code without any visible errors.

at java.lang.Throwable.fillInStackTrace(Native Method)
at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
  - locked <0x00000006a697a0b8> (a org.springframework.beans.NotWritablePropertyException)
  ...
org.springframework.beans.AbstractNestablePropertyAccessor.processLocalProperty(AbstractNestablePropertyAccessor.java:426)
at org.springframework.beans.AbstractNestablePropertyAccessor.setPropertyValue(AbstractNestablePropertyAccessor.java:278)
  ...
at org.springframework.validation.DataBinder.doBind(DataBinder.java:735)
at org.springframework.web.bind.WebDataBinder.doBind(WebDataBinder.java:197)
at org.springframework.web.bind.ServletRequestDataBinder.bind(ServletRequestDataBinder.java:107)
at org.springframework.web.method.support.InvocableHandlerMethod.getMethodArgumentValues(InvocableHandlerMethod.java:161)
  ...
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:991)

Spring silently catches these exceptions while trying to bind request parameters to an ApiContext object, leading to many failed set attempts and performance loss.

Our controller method:

@RequestMapping("test.json")
public Map testApi(@RequestParam(name = "id") String id, ApiContext apiContext) {}

Without a dedicated HandlerMethodArgumentResolver, Spring creates an empty ApiContext and attempts to set each incoming parameter, catching failures. By providing a custom argument resolver we avoided the costly trial‑and‑error binding, achieving roughly a ten‑fold performance improvement.

Conclusion

Performance optimization is an ongoing effort; avoiding accumulated technical debt, writing efficient code, and regularly testing performance are essential to maintain system stability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Java JVM Performance Optimization spring circuit breaker Hystrix

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.