How We Doubled Service QPS and Fixed Hystrix Bottlenecks in Two Weeks
In this article we detail a two‑week sprint that identified and eliminated multiple Java backend performance bottlenecks, optimized CPU usage with jtop, re‑engineered Hystrix circuit‑breaker settings, reduced logging overhead, and fixed Spring data‑binding issues, ultimately doubling QPS and stabilizing service recovery.
Background
Our service hit a performance ceiling after rapid early development, leading to high server load (10‑20) and CPU usage over 60% even under low QPS. Errors surged during traffic spikes, and Hystrix circuit breaking failed to recover quickly, causing anxiety over potential service collapse.
When demand finally eased, leadership set a two‑week deadline to resolve the issues. Through systematic investigation we eliminated several bottlenecks, revised the circuit‑breaker strategy, and achieved a doubled QPS capacity with stable circuit breaking even under 3‑4× load.
Server High CPU and Load
The service fetches data from storage or remote calls, then performs extensive transformations, causing CPU usage to stay above 50% under normal conditions.
To inspect JVM thread resource consumption we used jtop, a simple jar that prints JVM statistics for a given PID.
Heap Memory: INIT=134217728 USED=230791968 COMMITED=450363392 MAX=1908932608
NonHeap Memory: INIT=2555904 USED=24834632 COMMITED=26411008 MAX=-1
GC PS Scavenge VALID [PS Eden Space, PS Survivor Space] GC=161 GCT=440
GC PS MarkSweep VALID [PS Eden Space, PS Survivor Space, PS Old Gen] GC=2 GCT=532
ClassLoading LOADED=3118 TOTAL_LOADED=3118 UNLOADED=0
Total threads: 608 CPU=2454 (106.88%) USER=2142 (93.30%)
NEW=0 RUNNABLE=6 BLOCKED=0 WAITING=2 TIMED_WAITING=600 TERMINATED=0
main TID=1 STATE=RUNNABLE CPU_TIME=2039 (88.79%) USER_TIME=1970 (85.79%) Allocted: 640318696
com.google.common.util.concurrent.RateLimiter.tryAcquire(RateLimiter.java:337)
io.zhenbianshu.TestFuturePool.main(TestFuturePool.java:23)
RMI TCP Connection(2)-127.0.0.1 TID=2555 STATE=RUNNABLE CPU_TIME=89 (3.89%) USER_TIME=85 (3.70%) Allocted: 7943616
sun.management.ThreadImpl.dumpThreads0(Native Method)
sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:454)
me.hatter.tools.jtop.rmi.RmiServer.listThreadInfos(RmiServer.java:59)
me.hatter.tools.jtop.management.JTopImpl.listThreadInfos(JTopImpl.java:48)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
... ...Analyzing the thread stacks revealed CPU‑intensive points such as JSON serialization, deserialization, and bean copying. Optimizations included increasing bean reuse and replacing JSON with Protocol Buffers, dramatically lowering CPU pressure.
Circuit Breaker Framework Optimization
We originally used Hystrix, which is no longer maintained, but kept it due to existing stack compatibility. We considered resilience4j and Alibaba Sentinel as alternatives.
Abnormal Response Time
Access logs showed requests taking 1200‑2000 ms, exceeding the 1000 ms timeout. Since Hystrix runs business logic in an asynchronous thread, the main thread returns immediately on timeout, indicating the delay originates in the Hystrix layer, Spring, or the system.
We generated flame graphs from repeated jstack outputs to pinpoint the issue.
The flame graph showed many threads stuck in LockSupport.park, originating from HystrixTimer.addTimerListener, which handles asynchronous timeout callbacks. High request volume created a flood of TimerListeners, causing lock contention.
Investigation revealed that the same RPC result was fetched 3‑5 times per request, and a LocalCache wrapped the RPC call with a Hystrix annotation, leading to 3000‑5000 Hystrix invocations for a single 1000 QPS request.
@HystrixCommand(
fallbackMethod = "fallBackGetXXXConfig",
commandProperties = {
@HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "200"),
@HystrixProperty(name = "circuitBreaker.errorThresholdPercentage", value = "50")},
threadPoolProperties = {
@HystrixProperty(name = "coreSize", value = "200"),
@HystrixProperty(name = "maximumSize", value = "500"),
@HystrixProperty(name = "allowMaximumSizeToDivergeFromCoreSize", value = "true")})
public XXXConfig getXXXConfig(Long uid) {
try {
return XXXConfigCache.get(uid);
} catch (Exception e) {
return EMPTY_XXX_CONFIG;
}
}We moved the Hystrix annotation to the cache’s load method and switched isolation to semaphore mode, eliminating the thread‑pool overhead. This reduced maximum response time and further lowered CPU usage.
Note: Semaphore isolation cannot abort already‑executing methods, so excessive timeouts may still occupy permits.
Service Isolation and Degradation
Hystrix’s visual dashboard (enabled by adding hystrix-metrics-event-stream and hystrix-dashboard) helped us monitor circuit status.
With the optimizations, we could calculate a suitable semaphore limit: 2000*50/1000=100 for a target 50 ms average latency and 2000 QPS capacity. This throttles excess traffic during spikes while keeping latency in check.
Spring Data Binding Exception
During jstack analysis we encountered a stack trace where many threads were paused in Spring’s data‑binding exception handling, yet no logs or business errors appeared.
at java.lang.Throwable.fillInStackTrace(Native Method)
at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked <0x00000006a697a0b8> (a org.springframework.beans.NotWritablePropertyException)
...
org.springframework.beans.AbstractNestablePropertyAccessor.processLocalProperty(AbstractNestablePropertyAccessor.java:426)
at org.springframework.beans.AbstractNestablePropertyAccessor.setPropertyValue(AbstractNestablePropertyAccessor.java:278)
...
at org.springframework.validation.DataBinder.doBind(DataBinder.java:735)
...
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:991)Spring silently caught these exceptions while binding request parameters to an ApiContext object. Because the controller received dozens of parameters, the default binding attempted to set each field, generating many caught exceptions and degrading performance.
List<PropertyAccessException> propertyAccessExceptions = null;
List<PropertyValue> propertyValues = (pvs instanceof MutablePropertyValues ?
((MutablePropertyValues) pvs).getPropertyValueList() : Arrays.asList(pvs.getPropertyValues()));
for (PropertyValue pv : propertyValues) {
try {
setPropertyValue(pv);
} catch (NotWritablePropertyException ex) {
if (!ignoreUnknown) {
throw ex;
}
// otherwise ignore
}
...
}By providing a custom HandlerMethodArgumentResolver for ApiContext, we bypassed the costly default binding, achieving roughly a ten‑fold performance improvement.
Conclusion
Performance optimization is an ongoing effort; postponing technical debt leads to painful refactors. Regular code reviews, awareness of hidden costs in third‑party tools, and systematic load testing help keep services stable and efficient.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
