How We Boosted Server Throughput by 50%: Java Thread Pool, JVM, and Memory Tuning Secrets
This article details a step‑by‑step performance tuning journey for a Java‑based SOA service, covering thread‑pool redesign, memory‑allocation adjustments, and JIT compilation tweaks that together lifted QPM by nearly 40% and overall system throughput by about 50%.
Background
To encourage high‑quality user‑generated content on the "Ahha Moment" platform, the team built a reward‑based posting system in March 2020. As development stabilized, attention shifted to server performance, moving away from simple hardware scaling toward code‑level and JVM optimizations.
Performance Metrics and Scope
Performance tests were run on a single server, targeting ~60% CPU utilization while monitoring TP99 latency and QPM (queries per minute). The focus was on the SOA layer, which aggregates upstream data for various client platforms and thus becomes a potential bottleneck. Three main areas were tuned: thread pool, memory allocation, and JVM JIT compilation.
Thread‑Pool Tuning
The initial implementation used ExecutorService.newCachedThreadPool(), which created an unbounded number of threads. Load tests showed QPM in the low hundreds and many threads stuck in waiting state, causing excessive CPU scheduling overhead. By analyzing the source, the team realized the pool had a core size of 0, a maximum size of 2³¹‑1, a 60‑second keep‑alive, and a SynchronousQueue that never stores tasks.
To improve, a custom ThreadPoolExecutor subclass was created that overrides beforeExecute to inspect the task queue length via getQueue(). After experimentation, the core pool size was set to 90 threads and a warm‑up routine was added to pre‑start all core threads. These changes eliminated uncontrolled thread growth and reduced first‑run latency, raising QPM by roughly 40%.
Memory Optimization
Using jmap, the team inspected heap objects and identified a frequently created MenuItemVo that could be reused instead of instantiated repeatedly, reducing garbage‑collection pressure. The application runs on Oracle JRE 1.8 with the G1 collector. Log analysis showed a default young‑to‑old generation ratio of 2:1, with a 5 GB heap where the young generation occupied 3 GB but only ~12% of the old generation was used.
By disabling G1’s automatic size adjustment and manually setting -Xmn4g (new‑generation size) for a 5 GB heap, old‑generation usage dropped, GC frequency fell by about 10%, and QPM improved by roughly 5%.
JIT Compilation Tuning
The default HotSpot JIT was switched to compilation‑only mode with -Xcomp, which doubled startup time and unexpectedly reduced throughput by 10‑15% because many optimization passes were disabled. The team then lowered the compilation threshold using -XX:CompileThreshold, allowing earlier JIT compilation of hot methods and recovering the lost performance.
Conclusion
Combining thread‑pool redesign, memory‑size tuning, and JIT threshold adjustment yielded an overall performance gain of nearly 50%. The author notes that performance tuning is an ongoing process, with many additional ideas to explore in the future.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
