Backend Development 14 min read

Performance Troubleshooting and Optimization of a ToB System: From Low Throughput to Improved CPU Utilization

This article documents a step‑by‑step investigation of a Java Spring backend that initially achieved only 50 requests per second under load, detailing how slow SQL, excessive logging, thread‑pool misconfiguration, bean‑creation overhead and CPU‑bound operations were identified and mitigated to roughly double the throughput while reducing response latency.

Architecture Digest

Oct 31, 2023

Performance Troubleshooting and Optimization of a ToB System: From Low Throughput to Improved CPU Utilization

Background – A low‑traffic B2B system suddenly needed to handle at least 500 req/s on a single node. Initial calculations suggested Tomcat with 100 threads could process a request in 200 ms, but the first load test showed only 50 req/s and CPU usage near 80 %.

Initial Findings – The response‑time distribution revealed a minimum < 100 ms, a maximum of 5 s, and most requests around 4 s, indicating severe latency spikes caused by blocking operations.

Locating the "slow" cause – The team first added timing probes and log alerts for API latency, remote calls, Redis access (>10 ms) and SQL execution (>100 ms). Logs exposed a slow SQL statement:

<!-- 主要类似与库存扣减 每次-1 type 只有有限的几种且该表一共就几条数据（一种一条记录）-->
<!-- 压测时可以认为 type = 1 是写死的 -->
update table set field = field - 1 where type = 1 and filed > 1;

Because the statement updates the same row concurrently, lock contention caused >80 % of the request time.

First optimization – The SQL was made asynchronous, which cut the maximum latency from 5 s to 2 s and the 95th‑percentile from 4 s to 1 s, roughly doubling throughput.

Further investigation – Log timestamps showed occasional 100‑ms gaps without any obvious work, suggesting thread switches, excessive logging (500 MB of logs in 5 min) and possible stop‑the‑world pauses. Actions taken:

Raised log level to DEBUG (small impact).

Reduced @Async thread‑pool core size from 100 to ≤50 and limited queue size.

Increased JVM heap from 512 MB to 4 GB, observed Young GC frequency drop from 4/s to 2/s.

CPU usage remained high despite fewer threads, prompting a deeper look at bean creation.

Bean creation overhead – Stack traces revealed frequent calls to BeanUtils.getBean(RedisMaster.class), which triggers AbstractAutowireCapableBeanFactory#doCreateBean for a prototype‑scoped Redis client. With ~10 calls per request, this added significant latency.

RedisTool redisTool = BeanUtils.getBean(RedisMaster.class);

The solution was to replace prototype beans with direct new Redis() instances, eliminating the costly createBean path.

Timing utilities impact – The code also used System.currentTimeMillis() and Hutool's StopWatch for every operation. Under high concurrency these calls add measurable overhead, especially when combined with custom clocks.

Final results – After all adjustments, throughput rose from 50 req/s to nearly 200 req/s, maximum latency dropped to 2 s, and 95th‑percentile to 1 s. The investigation highlighted the importance of:

Identifying lock contention in SQL.

Controlling thread‑pool size and logging volume.

Avoiding expensive prototype bean creation in hot paths.

Monitoring JVM memory and GC behavior.

Conclusion – Performance tuning is an iterative process; even seemingly minor choices like bean scope or timing APIs can dramatically affect throughput and CPU utilization. Continuous profiling and a clear troubleshooting methodology are essential for reliable backend services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Optimization spring throughput Profiling

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.