How to Thoroughly Benchmark Backend Services: Key Metrics and Practical Steps
This article explains which external and internal performance metrics matter for backend services, how to interpret them, common bottlenecks, and provides a step‑by‑step example using JMeter to measure throughput, latency, CPU, memory, load, network and disk I/O.
Overview
Different stakeholders focus on different performance indicators. Callers of a backend API usually care about throughput and response time, while service owners also monitor CPU, memory, load, network and disk I/O.
External Metrics
From the client side, three primary metrics are considered:
Throughput – number of requests or tasks processed per second.
Response time – time taken to handle a single request or task.
Error rate – proportion of failed requests in a batch.
Real‑time services such as intelligent suggestions require sub‑100 ms latency, while navigation services can tolerate 2‑5 s. Latency should be reported with mean, 90th, 99th percentiles and distribution.
Throughput is influenced by response time, hardware, and network conditions. Typical relationships:
Higher throughput usually leads to longer response time.
Better hardware yields higher throughput.
Poor network reduces throughput.
Internal Metrics
From the server perspective, monitor CPU, memory, load, network and disk I/O.
CPU
Key Linux CPU time percentages (from top or sar) are:
us – user‑mode CPU usage
sy – system‑mode CPU usage
ni – nice‑adjusted user CPU
id – idle CPU
wa – time waiting for I/O
hi – hardware interrupt time
si – software interrupt time
Example top output is shown in the image.
Memory
Important fields from /proc/${PID}/status are VIRT, RES, SHR, SWAP and DATA. During testing, focus on RES and VIRT, and also SHR for services using shared memory.
Load (Server Load)
Load is the average length of the run queue (processes waiting for CPU). Ideal state: each CPU core has a load of ~1. Recommended load threshold is 70‑80% of the maximum capacity.
Commands top, uptime and cat /proc/loadavg provide these values.
Network
Monitor traffic with nethogs and connection states with netstat or ss. For TCP services, watch the number of ESTABLISHED connections; for HTTP services, monitor socket buffers and TIME_WAIT counts.
Disk I/O
Use iostat (add -x for extended stats) to observe:
tps – transfers per second
kB_read/s, kB_wrtn/s – data rates
await – average I/O wait time (ms)
%util – device utilization percentage
Common Performance Bottlenecks
Throughput hits ceiling while load is still below threshold – often due to insufficient resources (ulimit, thread count, memory).
High wa with low us / sy – may indicate disk‑intensive workload or memory pressure causing swapping.
Highly variable response times at stable throughput – could be lock contention or limited OS resources.
Memory continuously growing – likely a memory leak; use tools like valgrind to investigate.
Example: Intelligent Suggestion Service
The service crashed under a traffic surge. The test goal was to determine the maximum QPS of each upstream module, assuming the downstream data service can sustain 3500 QPS.
Test preparation:
Test data: logs from the day the service failed.
QPS estimate: the target metric.
Server configuration: identical to production hardware.
Load testing was performed with JMeter. The JMeter test plan included data file configuration, throughput shaping, HTTP sampler settings, and response assertions (see images for the configuration UI).
Key command to capture a single snapshot of CPU usage: $top -n 1 -b -p ${pid} Load was obtained with uptime and memory details from /proc/${PID}/status. Disk I/O was monitored via iostat -x.
Test Report Output
A complete performance report should contain:
Test conclusions – maximum QPS, latency, whether targets were met, and deployment recommendations.
Test environment description – performance requirements, server specs, data source, testing methodology.
Metric statistics – response‑time distribution, QPS, server‑level and process‑level metrics, preferably visualized with charts.
Conclusion
The single‑instance intelligent‑suggestion service achieved about 300 QPS, the whole system about 1800 QPS, while the traffic spike reached >5000 QPS, explaining the outage. After scaling the service and throttling traffic, stability was restored.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
