Performance Optimization of Java Backend Services: Reducing CPU Load, Improving Hystrix Circuit Breaking, and Fixing Spring Data Binding Issues
This article describes how a Java backend service suffered high CPU usage and load, how the team diagnosed the problems with jtop and thread stacks, optimized JSON/Bean processing, re‑engineered Hystrix circuit‑breaker settings, reduced logging overhead, and fixed Spring data‑binding exceptions to double the QPS and achieve stable recovery after traffic spikes.
Background
Recently our service hit a performance bottleneck because early development rushed requirements without proper optimization. Under modest QPS the server load reached 10‑20, CPU usage stayed above 60%, and during traffic peaks the interfaces threw many errors. Although we used Hystrix for circuit breaking, the service could not recover quickly after a break, causing anxiety about potential service avalanche.
After the demand slowed, the leader set a two‑week deadline to eliminate the performance issues. During this period we identified several bottlenecks, revised the circuit‑breaker strategy, and ultimately doubled the service's QPS capacity while ensuring stable operation under 3‑4× higher load.
High CPU and Load on the Server
The service fetches a batch of data from storage or remote calls, then performs many transformations before returning the result. The long transformation pipeline caused CPU usage to stay above 50% even under normal load.
To inspect per‑thread resource consumption inside the JVM we used jtop, a lightweight jar that prints JVM statistics for a given PID: java -jar jtop.jar [options] <pid> By default jtop prints the top 5 CPU‑consuming thread stacks using the -stack n option. Sample output showed heap memory, non‑heap memory, GC stats, thread counts, and detailed stack traces.
Analyzing these thread stacks revealed code points that heavily consumed CPU, such as JSON serialization/deserialization and Bean copying. We reduced CPU pressure by increasing Bean reuse and replacing JSON with Protocol Buffers where appropriate.
Circuit‑Breaker Framework Optimization
We originally used Hystrix, which is no longer maintained, but kept it because it was already in the tech stack. The controller and inner RPC calls were annotated with Hystrix, using thread‑pool isolation (max threads 2000 for the controller, 500 for RPC) and a 1000 ms timeout at the outer layer, 200 ms for the inner RPC.
Abnormal Response Times
Some requests took 1200 ms or even 2000 ms. Since Hystrix executes business logic in an async thread, the main thread returns immediately on timeout, indicating the delay could be in Hystrix, Spring, or the system layer.
We generated flame graphs from repeated jstack outputs and observed many threads blocked at LockSupport.park, originating from HystrixTimer.addTimerListener. The timer listeners were created for each RPC call because the same RPC result was fetched 3‑5 times per request and cached via a local cache. The Hystrix annotation was placed on the cache’s get method, causing thousands of timer listeners under QPS 1000.
We moved the @HystrixCommand annotation to the cache’s load method and switched isolation from thread‑pool to semaphore mode. This eliminated the excessive timer listeners and reduced response times dramatically.
Note that semaphore isolation only limits entry to a method; it cannot abort a method already executing, which may still cause occasional timeouts.
Service Isolation and Degradation
Initially Hystrix’s circuit‑breaker behaved intermittently under high load. By adding the Hystrix metrics stream ( hystrix-metrics-event-stream) and using the Hystrix Dashboard we gained clear visibility of circuit status.
With the optimizations, the maximum response time became predictable. Assuming a tolerable average response time of 50 ms and a maximum QPS of 2000, we calculated a suitable semaphore limit: 2000*50/1000 = 100. If rejection errors rise, we can add redundancy.
This combined strategy of request rejection, strict timeout enforcement, and circuit breaking ensures stable average response times even during traffic spikes.
High Load During Circuit Breaker and Slow Recovery
When the circuit broke, the service’s load kept rising and did not recover promptly after traffic decreased. Observing the service with tools like jtop during high load often only showed JVM TI threads, which was not helpful.
We also noticed a flood of error logs that persisted for minutes after the service stabilized, adding I/O pressure and further load. By removing exception stack traces from log output and simplifying the Spring ExceptionHandler, we limited log volume, allowing the service to recover quickly once the circuit closed.
Spring Data Binding Exception
While inspecting jstack we found threads stuck in Spring’s data‑binding code without any visible errors. Spring silently caught NotWritablePropertyException during binding of a large number of request parameters to an ApiContext object.
Spring’s default binding attempts to set every incoming parameter on the target object; failures are caught and ignored. With dozens of parameters this caused significant overhead. Implementing a custom HandlerMethodArgumentResolver for ApiContext moved the binding logic out of the generic path, reducing the number of failed set attempts and improving performance by roughly 10%.
@RequestMapping("test.json")
public Map testApi(@RequestParam(name = "id") String id, ApiContext apiContext) { }The original generic binding code looked like this:
List<PropertyAccessException> propertyAccessExceptions = null;
List<PropertyValue> propertyValues = (pvs instanceof MutablePropertyValues ?
((MutablePropertyValues) pvs).getPropertyValueList() : Arrays.asList(pvs.getPropertyValues()));
for (PropertyValue pv : propertyValues) {
try {
setPropertyValue(pv);
} catch (NotWritablePropertyException ex) {
if (!ignoreUnknown) {
throw ex;
}
// otherwise ignore and continue
}
}Conclusion
Performance optimization is an ongoing effort; postponing technical debt leads to painful fixes. Regularly reviewing code practices, being aware of hidden costs of third‑party tools, and conducting periodic performance tests help keep services stable and efficient.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
