Backend Development 17 min read

Performance Tuning of a Spring Boot Backend: Identifying and Resolving Throughput Bottlenecks

The article details a step‑by‑step investigation of a Spring Boot service that failed to meet a 500 requests/second target, analyzes slow SQL, thread‑pool misconfiguration, excessive logging, and high CPU usage, and presents concrete optimizations that roughly doubled throughput.

Top Architect

Dec 25, 2023

Performance Tuning of a Spring Boot Backend: Identifying and Resolving Throughput Bottlenecks

The author, a senior architect, describes a ToB system that originally had no load testing but suddenly needed to handle at least 500 requests/s for a key interface. Initial calculations suggested 100 threads would be enough, yet a 100‑concurrency test only achieved 50 req/s with CPU usage near 80%.

Background

The service exhibited a minimum response time under 100 ms in single‑thread scenarios, but under load the maximum latency reached 5 seconds and most requests clustered around 4 seconds, far from the desired performance.

Analysis Process – Locating the "slow" causes

Key investigation points included locks (synchronization, distributed, DB), and time‑consuming operations (network calls, SQL). The team added instrumentation to log warnings when response times exceeded thresholds:

Interface response > 500 ms

Remote call > 200 ms

Redis access > 10 ms

SQL execution > 100 ms

Log analysis revealed a slow SQL statement that updated a single‑row inventory table, causing lock contention and accounting for over 80% of the request latency.

<!-- Example slow SQL -->
update table set field = field - 1 where type = 1 and filed > 1;

After converting the operation to asynchronous execution, latency improved but throughput still fell short of the goal.

Further Investigation – Thread switching, logging overhead, and STW

Additional logs showed intermittent 100‑ms gaps without obvious work, suggesting thread switches, excessive logging, or stop‑the‑world pauses. The team reduced log level to DEBUG (minor gain) and re‑configured @Async thread pools, limiting core threads to 50, which raised throughput to around 200 req/s.

JVM GC analysis showed frequent Young GC (≈4 times/s) with a 512 MB heap; increasing heap to 4 GB reduced GC frequency but did not further improve throughput.

CPU Utilization Diagnosis

Despite reducing thread count, CPU usage remained high. Investigation focused on two possibilities: hidden extra threads and CPU‑intensive code. Monitoring revealed many threads each using ~10% CPU, but no single hotspot.

Stack traces exposed frequent calls to BeanUtils.getBean(...) which internally invoked createBean, a heavyweight operation involving bean initialization, dependency injection, and proxy creation. Approximately 200 such calls occurred during request processing because Redis utility beans were defined with prototype scope, causing a new bean (and thus a new Redis client) per call.

RedisTool redisTool = BeanUtils.getBean(RedisMaster.class);

Switching to direct new instantiation eliminated the prototype overhead.

Timing Measurement Overhead

The code also used manual System.currentTimeMillis() and Hutool StopWatch for timing, which added measurable overhead under high concurrency.

long start = System.currentTimeMillis();
// ...
long end = System.currentTimeMillis();
long runTime = start - end;

Final Results

After addressing slow SQL, reducing prototype bean usage, tuning thread pools, and increasing JVM memory, the maximum latency dropped from 5 s to 2 s, the 95th percentile from 4 s to 1 s, and overall throughput roughly doubled, though still short of the original 500 req/s target.

Summary of Optimizations

MySQL: tuned buffer pool, change buffer, redo log.

Code: async execution, thread‑pool size control, Tomcat thread configuration, Druid pool tuning.

JVM: increased heap size, adjusted GC settings.

The author emphasizes the need for systematic performance knowledge and a clear troubleshooting methodology rather than ad‑hoc “try everything” approaches.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance Optimization Spring Boot Throughput thread pool Profiling

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.