Performance Testing Essentials: Metrics, Bottlenecks, and Real-World Optimizations
This article explains what performance testing is, introduces the four core system metrics—response time, throughput, resource utilization, and concurrency—illustrates their relationships, details how request latency is composed, and presents practical tuning techniques and real‑world case studies to improve system performance.
1. What is Performance Testing
Performance testing is the process of applying load to a system according to a test strategy to obtain response time, throughput, resource utilization and other performance indicators, in order to verify whether the system can meet user requirements after launch. It includes test requirements, environment, tools, plan, execution, and result analysis.
2. Four Major System Metrics
The four key metrics are:
Response Time : the time from sending a request to receiving the final response; the most important indicator of system speed.
Throughput : the number of requests processed per unit time (e.g., TPS, QPS, HPS).
Resource Utilization : usage of CPU, memory, disk, network of application, database and middleware servers.
Concurrency : the number of users submitting requests simultaneously.
Relationship: Throughput = Concurrency / Average Response Time.
When load is low, response time remains stable and throughput grows linearly with concurrency; as load increases, response time rises, resources hit limits, and throughput plateaus or drops.
3. Where Does Time Go?
A request consists of three steps: client sends request, server processes business logic and data access, server returns response. The total response time is the sum of the times spent in these three steps, each of which can be affected by hardware, network, code, or middleware issues.
4. Performance Tuning Techniques
Common methods include:
Space‑for‑time: caching data in memory.
Time‑for‑space: batch processing large attachments.
Divide and conquer: split tasks for parallel execution.
Asynchronous processing: use message queues to offload long‑running work.
Parallelism: run multiple processes or threads concurrently.
Edge proximity: use CDN to serve static resources closer to users.
Scalability: modularize services, adopt stateless design, ensure horizontal scalability.
5. Real‑World Cases
Case 1
Problem: Response time increased during load testing of an interface.
Analysis: Thread stack showed growing number of FailoverEvent threads leading to OOM; code created duplicate FailoverEvent queues.
Solution: Check for existing queue before creation.
Result: Memory overflow resolved, response time normalized.
Recommendations:
Release unused object references early.
Prefer StringBuffer over String for frequent concatenations.
Minimize static variables.
Avoid creating large objects in bulk.
Use object pools.
Avoid object creation inside hot loops.
Case 2
Problem: Processing 10,000 orders with 4,500 SKUs took 433 seconds.
Analysis: Single‑threaded calls to downstream service; ~410 calls each ~519 ms.
Solution: Switch to multithreaded calls.
Result: TP99 reduced from 212 s to 33 s; TPS increased from 87 to 127 per second.
Recommendation: Use multithreading when I/O is present; employ thread pools.
Case 3
Problem: Query interface TP99 = 727 ms, throughput did not increase, CPU usage stayed below 40%.
Analysis: Each request invoked selectList 11 times, causing high latency.
Solution: Reduce redundant calls to a single selectList per request.
Result: TP99 dropped to 19 ms (38× improvement); TPS rose from 17.5 to 163.4 per second.
Recommendations:
Design before coding.
Move database operations out of loops.
Use IN queries instead of loops (space‑for‑time).
Batch inserts for bulk adds.
Case 4
Problem: Database update caused deadlock.
Analysis: Transactions conflicted as shown in Table 1.
Solution: Split the transaction: query first, then batch delete.
Result: Deadlock resolved.
Recommendations:
Avoid large transactions.
Access data in a consistent order.
Do not include user interaction in transactions.
Consider lower isolation levels (e.g., READ COMMITTED).
Add appropriate indexes.
Avoid concurrent scripts that heavily read/write the same table.
Set lock wait timeout (innodb_lock_wait_timeout).
6. Summary
Response time is only a symptom; the root cause lies in how resources—hardware, software, threads, data—are utilized. Optimization is about configuring resources more reasonably to meet business needs, avoiding premature or excessive tuning, and recognizing that performance tuning is an ongoing effort as the system evolves.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
