Why My Java Service Handles Only 50 RPS vs 500? Deep Dive into Bottlenecks

A Java ToB system struggled to meet a 500 requests‑per‑second target, revealing hidden bottlenecks such as slow SQL, excessive logging, thread‑pool misconfiguration, high CPU usage from Spring bean creation, and improper Redis usage, which were systematically identified and mitigated.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
Why My Java Service Handles Only 50 RPS vs 500? Deep Dive into Bottlenecks

Background

The company’s ToB system had no concurrency requirements and had never been load‑tested. A major client demanded a minimum throughput of 500 requests/s for core interfaces.

Initially, 500 RPS seemed easy: with Tomcat configured for 100 threads, each request could be processed in about 200 ms. However, a load test with 100 concurrent users yielded only 50 RPS, and CPU usage spiked to ~80%.

Analysis Process

Locate the "slow" cause

Ignore the high CPU usage for now.

Key suspects were blocking points such as locks (synchronization, distributed, DB) and time‑consuming operations (network calls, SQL).

Lock (synchronization lock, distributed lock, database lock)

Time‑consuming operations (network latency, SQL latency)

Instrumentation was added:

Log a warning if an interface response exceeds 500 ms.

Log a warning if a remote call exceeds 200 ms.

Log a warning if Redis access exceeds 10 ms.

Log a warning if SQL execution exceeds 100 ms.

Log analysis revealed a slow SQL statement that updated a single row in a high‑contention table, causing lock wait times that accounted for >80% of the request latency.

<!-- Example slow SQL -->
update table set field = field - 1 where type = 1 and filed > 1;

The SQL was executed asynchronously, but the underlying lock contention remained.

After converting the operation to asynchronous execution, throughput roughly doubled, but still fell short of the target.

Continue locating "slow" cause

Further log inspection showed irregular gaps of several hundred milliseconds between log entries, suggesting thread switches, excessive logging, or Stop‑The‑World (STW) pauses.

Increase log level to DEBUG (small improvement).

Replace @Async with a bounded thread pool (core threads ≤50); this raised throughput to ~200 RPS.

Increase JVM heap from 512 MB to 4 GB; YGC frequency dropped from 4 /s to 2 /s, but throughput remained unchanged.

Despite reducing thread count, CPU usage stayed high, prompting a deeper investigation.

Locate high CPU usage

High CPU is often tied to thread count, but all threads were below 10% usage, indicating many idle threads rather than a few busy ones.

Stack traces showed frequent calls to BeanUtils.getBean (prototype‑scoped Redis beans). Each call triggers Spring’s createBean, which performs extensive initialization, locking, and proxy creation, causing significant overhead under concurrency.

RedisTool redisTool = BeanUtils.getBean(RedisMaster.class);

Switching to direct new RedisMaster() eliminated the prototype bean overhead.

Other performance observations

Using System.currentTimeMillis() for timing in high‑concurrency paths adds measurable overhead; alternatives like System.nanoTime() or lightweight stop‑watches are preferable.

Summary

The investigation uncovered multiple layers of bottlenecks: slow SQL with lock contention, excessive logging, oversized thread pools, insufficient JVM memory, and costly Spring prototype bean creation. Incremental fixes (asynchronous execution, thread‑pool tuning, heap increase, direct bean instantiation) collectively improved throughput from 50 RPS to nearly 200 RPS, though the original 500 RPS goal remains unmet.

Key takeaways:

Optimize database access patterns and avoid high‑contention updates.

Limit logging volume during load tests.

Configure thread pools conservatively; avoid excessive threads.

Prefer singleton beans for frequently accessed resources.

Use high‑resolution timers for fine‑grained measurements.

Further work includes deeper CPU profiling and exploring more efficient concurrency designs.

Practical Commands

Check service CPU: top -Hp <pid> Inspect JVM GC: jstat -gc <pid> 2000 Dump stack traces:

jstack -l <pid> > stack.log
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Javadatabasespringperformance tuningCPUthread pool
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.