Why 500 req/s Became 50 req/s: A Deep Dive into Spring Bean Creation Bottlenecks
A ToB system that seemed able to handle 500 requests per second stalled at 50 req/s due to hidden lock contention in prototype‑scoped Spring beans, slow SQL updates, excessive logging, and thread‑pool misconfiguration, prompting a step‑by‑step performance investigation and multiple optimizations.
1. Background
The project is a ToB system with little prior load testing. A new "big client" demanded a minimum of 500 requests/s per node for core interfaces. Initially it seemed trivial: with Tomcat configured for 100 threads, each thread would need to process a request in about 200 ms, well within the typical 100 ms response time.
When the load test started with 100 concurrent users, the observed throughput was only 50 req/s and CPU usage hovered around 80 %.
The performance chart showed a minimum latency under 100 ms, but a maximum of 5 seconds and a 95th‑percentile around 4 seconds, indicating severe outliers.
2. Analysis Process
2.1 Identify the "slow" cause
CPU saturation was temporarily ignored. The team suspected blocking and focused on two areas: locks (synchronised, distributed, DB) and time‑consuming operations (network latency, SQL).
Added latency alarms: response >500 ms, remote call >200 ms, Redis access >10 ms, SQL execution >100 ms.
Log analysis revealed a slow SQL statement:
<!-- Inventory decrement, only a few types, each type has a single row -->
<!-- In the test, type = 1 is hard‑coded -->
update table set field = field - 1 where type = 1 and field > 1;This statement caused lock contention; the waiting time accounted for more than 80 % of the total request latency.
Making the SQL execution asynchronous immediately doubled throughput (≈ 100 req/s) and reduced the maximum latency from 5 s to 2 s, with the 95th‑percentile dropping from 4 s to 1 s.
2.2 Continue locating the "slow" cause
Further logs showed gaps of several hundred milliseconds between INFO lines, suggesting thread switches, excessive logging, or stop‑the‑world pauses.
Thread switching due to too many threads.
Heavy logging (5 min of load test generated ~500 MB of logs).
Possible STW pauses (unlikely because logs continued).
Actions taken:
Raised log level to DEBUG – only a modest ~10 % improvement.
Adjusted @Async thread pools: previously three pools with a total core size of 100; reduced the overall core size to ≤ 50. Throughput rose to around 200 req/s.
Examined JVM GC: YGC frequency was 4 /s with a 512 MB heap. Increased heap to 4 GB lowered YGC to 2 /s, but throughput did not change significantly.
These steps yielded a noticeable performance gain, but the CPU usage remained high.
2.3 Locate high CPU usage
After cutting thread count, CPU stayed near 80 %. Thread‑level inspection showed no single thread exceeding 10 % CPU, but many threads were active.
Stack traces revealed repeated calls to BeanUtils.getBean(RedisMaster.class). RedisMaster is declared as a prototype‑scoped Spring component:
@Component
@Scope("prototype")
public class RedisMaster implements IRedisTool { /* ... */ }Each call triggers Spring's createBean logic, which involves synchronization, dependency injection, and proxy creation. In a high‑concurrency scenario this caused lock contention; the logs showed nearly 200 occurrences of this pattern.
Replacing the BeanUtils call with a direct new RedisMaster() eliminated the prototype bean creation overhead.
3. Summary
Performance improvements achieved:
Slow SQL fixed (asynchronous execution) – throughput ↑ ≈ 2×, max latency ↓ 5 s→2 s, 95th‑percentile ↓ 4 s→1 s.
Thread‑pool size reduced – throughput ↑ ≈ 200 req/s.
Heap increased – GC pauses reduced, but little impact on throughput.
Prototype bean creation removed – lock contention eliminated.
Overall throughput rose from 50 req/s to about 200 req/s, and response times improved dramatically, yet CPU usage remains puzzling, indicating further investigation is needed.
Key tuning areas mentioned: MySQL buffer pool, redo log, Druid connection pool, Tomcat configuration, JVM memory and GC settings.
4. Open Questions / TODO
Why does createBean cause such a large performance penalty under concurrency?
At what concurrency level does System.currentTimeMillis become a noticeable bottleneck?
Further study of systematic performance‑optimization techniques.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
