Operations 16 min read

How to Thoroughly Benchmark Backend Services: Key Metrics and Practical Steps

This article explains which external and internal performance metrics matter for backend services, how to interpret them, common bottlenecks, and provides a step‑by‑step example using JMeter to measure throughput, latency, CPU, memory, load, network and disk I/O.

dbaplus Community
dbaplus Community
dbaplus Community
How to Thoroughly Benchmark Backend Services: Key Metrics and Practical Steps

Overview

Different stakeholders focus on different performance indicators. Callers of a backend API usually care about throughput and response time, while service owners also monitor CPU, memory, load, network and disk I/O.

External Metrics

From the client side, three primary metrics are considered:

Throughput – number of requests or tasks processed per second.

Response time – time taken to handle a single request or task.

Error rate – proportion of failed requests in a batch.

Real‑time services such as intelligent suggestions require sub‑100 ms latency, while navigation services can tolerate 2‑5 s. Latency should be reported with mean, 90th, 99th percentiles and distribution.

Throughput is influenced by response time, hardware, and network conditions. Typical relationships:

Higher throughput usually leads to longer response time.

Better hardware yields higher throughput.

Poor network reduces throughput.

Internal Metrics

From the server perspective, monitor CPU, memory, load, network and disk I/O.

CPU

Key Linux CPU time percentages (from top or sar) are:

us – user‑mode CPU usage

sy – system‑mode CPU usage

ni – nice‑adjusted user CPU

id – idle CPU

wa – time waiting for I/O

hi – hardware interrupt time

si – software interrupt time

Example top output is shown in the image.

Memory

Important fields from /proc/${PID}/status are VIRT, RES, SHR, SWAP and DATA. During testing, focus on RES and VIRT, and also SHR for services using shared memory.

Load (Server Load)

Load is the average length of the run queue (processes waiting for CPU). Ideal state: each CPU core has a load of ~1. Recommended load threshold is 70‑80% of the maximum capacity.

Commands top, uptime and cat /proc/loadavg provide these values.

Network

Monitor traffic with nethogs and connection states with netstat or ss. For TCP services, watch the number of ESTABLISHED connections; for HTTP services, monitor socket buffers and TIME_WAIT counts.

Disk I/O

Use iostat (add -x for extended stats) to observe:

tps – transfers per second

kB_read/s, kB_wrtn/s – data rates

await – average I/O wait time (ms)

%util – device utilization percentage

Common Performance Bottlenecks

Throughput hits ceiling while load is still below threshold – often due to insufficient resources (ulimit, thread count, memory).

High wa with low us / sy – may indicate disk‑intensive workload or memory pressure causing swapping.

Highly variable response times at stable throughput – could be lock contention or limited OS resources.

Memory continuously growing – likely a memory leak; use tools like valgrind to investigate.

Example: Intelligent Suggestion Service

The service crashed under a traffic surge. The test goal was to determine the maximum QPS of each upstream module, assuming the downstream data service can sustain 3500 QPS.

Test preparation:

Test data: logs from the day the service failed.

QPS estimate: the target metric.

Server configuration: identical to production hardware.

Load testing was performed with JMeter. The JMeter test plan included data file configuration, throughput shaping, HTTP sampler settings, and response assertions (see images for the configuration UI).

Key command to capture a single snapshot of CPU usage: $top -n 1 -b -p ${pid} Load was obtained with uptime and memory details from /proc/${PID}/status. Disk I/O was monitored via iostat -x.

Test Report Output

A complete performance report should contain:

Test conclusions – maximum QPS, latency, whether targets were met, and deployment recommendations.

Test environment description – performance requirements, server specs, data source, testing methodology.

Metric statistics – response‑time distribution, QPS, server‑level and process‑level metrics, preferably visualized with charts.

Conclusion

The single‑instance intelligent‑suggestion service achieved about 300 QPS, the whole system about 1800 QPS, while the traffic spike reached >5000 QPS, explaining the outage. After scaling the service and throttling traffic, stability was restored.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MetricsJMeterLoad TestingCPUbackend services
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.