Backend Development 13 min read

Performance Tuning of a Java Spring Backend: From 50 TPS to Over 200 TPS

The article details a step‑by‑step performance investigation of a Java Spring backend that initially handled only 50 requests per second under load, covering slow SQL, excessive logging, thread‑pool misconfiguration, prototype‑scoped Redis beans, JVM memory settings, and the resulting optimizations that raised throughput to over 200 TPS.

Code Ape Tech Column

Dec 13, 2024

Performance Tuning of a Java Spring Backend: From 50 TPS to Over 200 TPS

The author describes a ToB system that previously had no load testing and was suddenly required by a major client to sustain at least 500 requests per second per node. Initial calculations assumed 200 ms per request with Tomcat's 100‑thread pool, but a load test at 100 concurrent users yielded only 50 TPS and CPU usage near 80%.

First, the investigation focused on typical bottlenecks: locks (synchronised, distributed, DB), slow operations (network, SQL) and added timing alerts for response time, remote calls, Redis latency and SQL execution. Log analysis revealed a slow UPDATE statement that repeatedly decremented a single inventory row, causing lock contention; after converting the operation to asynchronous execution the maximum response dropped from 5 s to 2 s and the 95th percentile from 4 s to 1 s, roughly doubling throughput.

Further profiling showed gaps in the logs that were not caused by explicit code. The team reduced log level to DEBUG, trimmed the number of @Async thread pools (core threads limited to 50), and increased JVM heap from 512 MB to 4 GB, which lowered Young GC frequency from 4 /s to 2 /s but did not significantly raise TPS.

CPU usage remained high despite fewer threads. Stack traces indicated frequent calls to BeanUtils.getBean which internally invoked createBean. The application used a prototype‑scoped Redis helper (RedisMaster) that was instantiated on every request; each call caused full bean creation, adding considerable overhead. Replacing the prototype bean with a direct new Redis... instance removed this cost.

The author also noted that pervasive timing code (System.currentTimeMillis or Hutool's StopWatch) adds measurable overhead under high concurrency, especially when combined with custom clocks.

In the final summary, the author lists the optimizations performed: MySQL buffer pool and redo‑log tuning, asynchronous execution, thread‑pool and Tomcat configuration adjustments, Druid connection‑pool tweaks, and JVM memory/GC tuning. The combined changes improved the system’s throughput by roughly two‑fold, though the author acknowledges that deeper performance‑engineering knowledge is still needed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance tomcat

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.