Backend Development 16 min read

How Changing Five Lines of Code Boosted API Throughput Over 10×

A low‑traffic B2B service struggled to meet a 500 req/s demand, achieving only 50 req/s with high CPU usage; through systematic profiling, lock analysis, async refactoring, thread‑pool tuning, and eliminating costly Spring bean creation, the team dramatically improved response times and throughput, revealing deeper CPU‑usage mysteries.

Java Architect Handbook

Apr 22, 2026

How Changing Five Lines of Code Boosted API Throughput Over 10×

Background

The service was a low‑traffic B2B system that had never been load‑tested. A new "big client" required a single‑node throughput of at least 500 requests per second for several critical APIs.

Initial expectations assumed 100 Tomcat threads could handle the load (200 ms per request), but a load test with 100 concurrent users showed only 50 req/s and CPU usage near 80%.

Analysis Process

Identifying the "slow" cause

First, the team ignored CPU usage and focused on response‑time bottlenecks, checking for locks and time‑consuming operations. They added timing alerts:

Log a warning if an API call exceeds 500 ms.

Log a warning if an internal remote call exceeds 200 ms.

Log a warning if a Redis access exceeds 10 ms.

Log a warning if a SQL execution exceeds 100 ms.

Log analysis revealed a slow SQL statement that locked a single row during high concurrency, accounting for over 80% of the request latency.

They quickly changed the SQL execution to asynchronous (using @Async) and observed the effect.

Continuing to locate "slow" factors

After the async change, throughput improved but still fell short. Further log inspection showed intermittent 100‑ms gaps without obvious work, suggesting thread switches, excessive logging, or stop‑the‑world pauses.

Raised log level to DEBUG (minor impact).

Adjusted @Async thread pools, reducing core threads from 100 to 50, which raised throughput by ~50%.

Increased JVM heap from 512 MB to 4 GB; GC frequency dropped but throughput did not change significantly.

Pinpointing high CPU usage

Even after reducing thread count, CPU usage remained high. The team examined active thread counts and found many threads each using ~10% CPU, indicating no single hot thread.

Stack traces showed frequent calls to BeanUtils.getBean(...) which internally invoked createBean. Each call triggered full bean initialization, proxy creation, and dependency injection—expensive operations when performed repeatedly.

Because RedisMaster was declared with @Scope("prototype"), each request created a new bean, leading to ~200 createBean invocations per API call.

To eliminate this overhead, the code was changed to instantiate the Redis client directly with new RedisMaster() instead of fetching it from the Spring container.

Additional timing measurements

The team noted that many timing utilities (custom System.currentTimeMillis() wrappers or Hutool StopWatch) internally use System.nanoTime(), which can add measurable overhead under high concurrency.

Final results

After the bean‑creation optimization and previous async changes, the maximum response time dropped from 5 s to 2 s, and the 95th‑percentile fell from 4 s to 1 s, roughly doubling throughput. However, the target of 500 req/s was still far away, prompting further investigation into CPU usage and other bottlenecks.

Key actions taken:

Identified and async‑executed a slow SQL statement.

Adjusted thread‑pool sizes and logging levels.

Increased JVM heap and observed GC behavior.

Replaced prototype‑scoped Spring bean retrieval with direct new instantiation to avoid repeated createBean overhead.

Remaining questions include why createBean incurs such high cost and whether Spring’s prototype scope is appropriate for high‑throughput scenarios.

Key Takeaways

Throughput is directly tied to API response time and thread count.

Lock contention from frequent database updates can dominate latency.

Spring prototype beans can introduce significant overhead under load; consider singleton scope or manual instantiation for high‑frequency components.

Profiling tools (jstack, top, jstat) and custom timing alerts are essential for systematic performance debugging.

Java Performance optimization concurrency Spring throughput profiling

Written by

Java Architect Handbook

Focused on Java interview questions and practical article sharing, covering algorithms, databases, Spring Boot, microservices, high concurrency, JVM, Docker containers, and ELK-related knowledge. Looking forward to progressing together with you.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.