Performance Testing, Response Time Analysis, and Database Connection Pool Optimization for a Multi‑Table Query Service
This article presents a comprehensive performance testing case study of a multi‑table database query interface, detailing common performance metrics, response‑time classifications, step‑by‑step load‑test results, root‑cause analysis of CPU, memory, and connection‑pool bottlenecks, and demonstrates how tuning Druid’s maxActive and maxWait parameters optimizes throughput and reduces latency.
Performance testing aims to simulate normal, peak, and abnormal load conditions using automated tools to evaluate system performance indicators such as throughput (TPS), response time (RT), and resource utilization (CPU, memory, disk I/O, network).
Common performance metrics include TPS, RT, and resource usage. Response time is a key user‑perceived metric and can be broken down into overall RT, average RT, and percentile‑based metrics (TP90, TP95, TP99) to filter out outliers and focus on the majority of requests.
The load‑test targeted a single API that performs multi‑table database queries: mapping brand‑to‑agent, linking user IDs to usernames, checking authorization, and re‑querying the brand‑agent mapping. A step‑wise ramp‑up approach identified the optimal concurrency level and measured RT and resource usage.
Load‑Test Results and Analysis : At 20 concurrent users the system was stable. At 40 users a 0.01% error rate appeared, and at 60 users errors rose to 0.94% with average RT increasing and TP99 reaching ~1000 ms. JMeter logs showed load‑time spikes over 1000 ms, causing most errors.
Investigation Process focused on four areas:
1. CPU and Memory
Monitoring with PINPOINT showed the service only called the database. At 60 concurrent users CPU usage stayed below 80% while RT hit 1.92 s, indicating the bottleneck lay in response time rather than CPU or memory.
2. Business Logic
Log analysis showed errors only under high concurrency, suggesting no logical bugs in the code.
3. Database Index and Data Volume
Data volume was ~1000 rows (normal). Explain statements confirmed the query used an index with an execution time of 43 ms, ruling out SQL inefficiency.
4. Database Connection Pool
PINPOINT revealed a timeout in MyBatis' SqlSessionTemplate → getConnection() caused by Druid connection‑pool exhaustion. Further analysis examined SqlSession handling (which is managed by Spring) and Druid pool parameters.
Druid Connection‑Pool Configuration :
# 初始化大小,最小
spring.datasource.initialSize=1
spring.datasource.minIdle=3
# 线程池最大线程数
spring.datasource.maxActive=20
# 配置获取连接等待超时的时间
spring.datasource.maxWait=60000Key parameters:
initialSize – number of physical connections created at startup.
minIdle – minimum idle connections to keep.
maxActive – maximum number of active connections.
maxWait – maximum time (ms) to wait for a connection.
Increasing maxActive to 100 (while keeping maxWait at 60000 ms) eliminated errors, raised TPS to 324 /sec, and kept TP99 in the low‑hundreds of milliseconds. Further increasing maxActive to 200 caused timeouts and reduced throughput, confirming 100 as the optimal setting.
Conclusion : When response time spikes during load testing, investigate server CPU/memory, database CPU/memory, application code, database indexes, and connection‑pool settings. Properly sizing Druid’s maxActive to match expected concurrency (≈100 in this case) and avoiding overly small maxWait values prevents connection‑pool timeouts and improves overall performance.
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.