How We Cut Server Costs and Boost Throughput: A Real-World Performance Tuning Case Study
This article details a comprehensive performance‑testing and optimization effort for a high‑traffic site‑service system, covering hardware setup, CAT monitoring integration, bottleneck identification, Spring Gateway and Netty tuning, asynchronous logging, and the resulting dramatic improvements in QPS and response times.
Background
Company formed a site‑service system productization project aiming to reduce server hardware cost, targeting support for over 10,000 concurrent users on a 32 GB memory, silver‑grade CPU configuration. Performance bottlenecks are identified via pressure testing and monitoring metrics such as QPS, RPS, response time, success rate, SQL latency, JVM, CPU and memory usage.
Pressure Test Preparation
Test Server
One test server was prepared with the following hardware:
CPU: i5‑9400 @ 2.90 GHz, 6 cores, 6 threads
Memory: DDR4 16 GB + 8 GB @ 2400 MHz
Disk: Kingston SA400M8 SSD 240 GB (≈50 % faster than HDD)
Network card: RTL8111/8168/8411 PCI‑Express Gigabit Ethernet
A gigabit switch was added because the original 100 M NIC limited bandwidth when more than 100 users generated traffic.
CAT Monitoring
CAT, a mature monitoring solution, was integrated to collect real‑time metrics (JVM, interfaces, SQL, alerts) via instrumentation points.
Official repository: https://github.com/dianping/cat
CAT Plugin Development and Integration
cat‑client3.0.jar, tp‑common‑cat‑servlet and tp‑common‑cat‑mybatis1.0.jar were added to the pom. A CatServerFilter was configured in web.xml to filter all requests. All external calls in the site‑service system are forwarded through a forward module, where instrumentation points were added (see diagram).
Configuration of the CAT client on the server side is placed in /data/appdatas/cat/client.xml. During the test, a gap was discovered in token verification and HTTP calls to the intelligence platform.
Additional instrumentation was added to SzptHttpUtil to monitor all HTTP calls to the intelligence platform.
Key Pressure‑Test Process
First Test Run
Results:
10 users – QPS 33 – CPU 80 %, Tomcat‑zw 15 % – 3 540 tickets in 10 min
20 users – QPS 27 – same CPU load – 4 440 tickets in 10 min
CPU was saturated at 10 concurrent requests, and increasing to 20 did not raise QPS, indicating the system could not handle more load.
Intelligence Platform Performance Investigation and Fixes
Interface Caching
Only 15 % of CPU was used by Tomcat‑z; the remaining 65 % was consumed elsewhere.
Top showed station‑base consuming high CPU. Process ID was obtained with ps -ef | grep station-base, thread ID with top -Hp 1061, and stack trace via jstack. Two endpoints lacked caching, causing repeated SQL queries: /base/openApi/line/getAllLineDownSiteBySiteNos and /base/openApi/line/getStationLineByLineNo.
Gateway Optimization
Temporary Files Issue
Spring Cloud Gateway created many temporary directories under /tmp, exhausting the filesystem. Replacing the delete command with find . -name "*" -print | xargs rm -rf cleared the directory in 10 seconds.
Upgrading spring‑web from 5.2.15 to 5.2.16 resolved a known issue (see GitHub issues #27094, #27092).
Custom Filter Removal
Custom request/response filters were disabled; performance tests showed no degradation, confirming they were not the bottleneck.
Reactor Netty Thread Configuration
Default Netty worker count was 6 with no selector threads. Environment variables were set to increase reactor.netty.ioWorkerCount and reactor.netty.ioSelectCount:
@Bean
public ReactorResourceFactory reactorClientResourceFactory() {
// configure thread groups
System.setProperty("reactor.netty.ioSelectCount","1");
int ioWorkerCount = Math.max(Runtime.getRuntime().availableProcessors()*3, 4);
System.setProperty("reactor.netty.ioWorkerCount", String.valueOf(ioWorkerCount));
return new ReactorResourceFactory();
}After adjustment, RPS reached 3 300 and average response time dropped below 150 ms.
Logback Asynchronous Writing
CPU flame graphs revealed most time spent in synchronous logback writes. Switching to asynchronous logging and disabling console output reduced CPU load.
Site‑Service System Investigation and Fixes
Global Token Filter Caching
Token verification added ~600 ms latency. Adding caching eliminated the delay.
Transactional Annotation Issue
@Transaction on ticket‑booking methods held database connections for the entire method, causing high latency. Removing the annotation reduced the gap.
When Spring encounters @Transactional, it obtains a connection from the pool and binds it to ThreadLocal for the whole method. Long‑running operations keep the connection occupied, leading to pool exhaustion, deadlocks, and slow rollbacks.
Logback CPU Consumption
Async‑profiler flame graphs showed logback logging dominated CPU usage. Removing console logging and enabling asynchronous log aggregation lowered CPU consumption, raising RPS from 23.7 to 222 and cutting average response time from 1 447 ms to 1 082 ms.
Other Optimizations
Switched JVM GC from CMS to G1 to reduce CPU usage.
Reduced the number of deployed WAR files from 13 to 9, removing obsolete modules.
Performance Comparison Before and After Optimizations
Charts from August 10, 19, 23 show the progressive improvement in QPS and response time.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tuanzi Tech Team
Tuanzi Mobility, Ticketing & Cloud Systems – we provide mature industry solutions, share high‑quality technical insights, and warmly welcome everyone to follow and share.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
