How We Cut Server Costs and Boost Throughput: A Real-World Performance Tuning Case Study

This article details a comprehensive performance‑testing and optimization effort for a high‑traffic site‑service system, covering hardware setup, CAT monitoring integration, bottleneck identification, Spring Gateway and Netty tuning, asynchronous logging, and the resulting dramatic improvements in QPS and response times.

Tuanzi Tech Team
Tuanzi Tech Team
Tuanzi Tech Team
How We Cut Server Costs and Boost Throughput: A Real-World Performance Tuning Case Study

Background

Company formed a site‑service system productization project aiming to reduce server hardware cost, targeting support for over 10,000 concurrent users on a 32 GB memory, silver‑grade CPU configuration. Performance bottlenecks are identified via pressure testing and monitoring metrics such as QPS, RPS, response time, success rate, SQL latency, JVM, CPU and memory usage.

Pressure Test Preparation

Test Server

One test server was prepared with the following hardware:

CPU: i5‑9400 @ 2.90 GHz, 6 cores, 6 threads

Memory: DDR4 16 GB + 8 GB @ 2400 MHz

Disk: Kingston SA400M8 SSD 240 GB (≈50 % faster than HDD)

Network card: RTL8111/8168/8411 PCI‑Express Gigabit Ethernet

A gigabit switch was added because the original 100 M NIC limited bandwidth when more than 100 users generated traffic.

CAT Monitoring

CAT, a mature monitoring solution, was integrated to collect real‑time metrics (JVM, interfaces, SQL, alerts) via instrumentation points.

Official repository: https://github.com/dianping/cat

CAT Plugin Development and Integration

cat‑client3.0.jar, tp‑common‑cat‑servlet and tp‑common‑cat‑mybatis1.0.jar were added to the pom. A CatServerFilter was configured in web.xml to filter all requests. All external calls in the site‑service system are forwarded through a forward module, where instrumentation points were added (see diagram).

Configuration of the CAT client on the server side is placed in /data/appdatas/cat/client.xml. During the test, a gap was discovered in token verification and HTTP calls to the intelligence platform.

Additional instrumentation was added to SzptHttpUtil to monitor all HTTP calls to the intelligence platform.

Key Pressure‑Test Process

First Test Run

Results:

10 users – QPS 33 – CPU 80 %, Tomcat‑zw 15 % – 3 540 tickets in 10 min

20 users – QPS 27 – same CPU load – 4 440 tickets in 10 min

CPU was saturated at 10 concurrent requests, and increasing to 20 did not raise QPS, indicating the system could not handle more load.

Intelligence Platform Performance Investigation and Fixes

Interface Caching

Only 15 % of CPU was used by Tomcat‑z; the remaining 65 % was consumed elsewhere.

Top showed station‑base consuming high CPU. Process ID was obtained with ps -ef | grep station-base, thread ID with top -Hp 1061, and stack trace via jstack. Two endpoints lacked caching, causing repeated SQL queries: /base/openApi/line/getAllLineDownSiteBySiteNos and /base/openApi/line/getStationLineByLineNo.

Gateway Optimization

Temporary Files Issue

Spring Cloud Gateway created many temporary directories under /tmp, exhausting the filesystem. Replacing the delete command with find . -name "*" -print | xargs rm -rf cleared the directory in 10 seconds.

Upgrading spring‑web from 5.2.15 to 5.2.16 resolved a known issue (see GitHub issues #27094, #27092).

Custom Filter Removal

Custom request/response filters were disabled; performance tests showed no degradation, confirming they were not the bottleneck.

Reactor Netty Thread Configuration

Default Netty worker count was 6 with no selector threads. Environment variables were set to increase reactor.netty.ioWorkerCount and reactor.netty.ioSelectCount:

@Bean
public ReactorResourceFactory reactorClientResourceFactory() {
    // configure thread groups
    System.setProperty("reactor.netty.ioSelectCount","1");
    int ioWorkerCount = Math.max(Runtime.getRuntime().availableProcessors()*3, 4);
    System.setProperty("reactor.netty.ioWorkerCount", String.valueOf(ioWorkerCount));
    return new ReactorResourceFactory();
}

After adjustment, RPS reached 3 300 and average response time dropped below 150 ms.

Logback Asynchronous Writing

CPU flame graphs revealed most time spent in synchronous logback writes. Switching to asynchronous logging and disabling console output reduced CPU load.

Site‑Service System Investigation and Fixes

Global Token Filter Caching

Token verification added ~600 ms latency. Adding caching eliminated the delay.

Transactional Annotation Issue

@Transaction on ticket‑booking methods held database connections for the entire method, causing high latency. Removing the annotation reduced the gap.

When Spring encounters @Transactional, it obtains a connection from the pool and binds it to ThreadLocal for the whole method. Long‑running operations keep the connection occupied, leading to pool exhaustion, deadlocks, and slow rollbacks.

Logback CPU Consumption

Async‑profiler flame graphs showed logback logging dominated CPU usage. Removing console logging and enabling asynchronous log aggregation lowered CPU consumption, raising RPS from 23.7 to 222 and cutting average response time from 1 447 ms to 1 082 ms.

Other Optimizations

Switched JVM GC from CMS to G1 to reduce CPU usage.

Reduced the number of deployed WAR files from 13 to 9, removing obsolete modules.

Performance Comparison Before and After Optimizations

Charts from August 10, 19, 23 show the progressive improvement in QPS and response time.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance Testingbackend optimization
Tuanzi Tech Team
Written by

Tuanzi Tech Team

Tuanzi Mobility, Ticketing & Cloud Systems – we provide mature industry solutions, share high‑quality technical insights, and warmly welcome everyone to follow and share.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.