Operations 15 min read

Master System Performance: Essential Tools & Techniques for Debugging Bottlenecks

This article consolidates practical knowledge on system performance optimization, covering key metrics, load‑testing utilities, Linux monitoring commands, and JVM profiling tricks to help engineers pinpoint and resolve throughput, latency, CPU, disk, and network bottlenecks.

Programmer DD
Programmer DD
Programmer DD
Master System Performance: Essential Tools & Techniques for Debugging Bottlenecks

System Performance Definition

Throughput – number of requests the system can handle per second.

Latency – time taken to process a single request.

Usage – overall resource utilization.

Throughput and Latency Relationship

Higher throughput usually leads to higher latency because the system becomes busier.

Lower latency allows higher throughput as the system can process requests faster.

Asynchrony can increase throughput flexibility but does not guarantee lower response time.

Common Load‑Testing Tools

tcpdump

-i : specify interface
-s : capture full packet (default 68 bytes, use -s 0 for full)
-w : write captured packets to file

Examples

tcpdump -i eth1 host 10.1.1.1   // capture all packets on eth1 with source or destination 10.1.1.1
tcpdump -i eth1 src host 10.1.1.1   // source address
tcpdump -i eth1 dst host 10.1.1.1   // destination address

To analyze with Wireshark, add -s 0 to capture full packets:

tcpdump -i eth0 tcp and port 80 -s 0 -w traffic.pcap

tcpcopy – online traffic replay

tcpcopy copies live traffic to a test machine for realistic load testing without deploying new code.

a. Record with tcpdump

tcpdump -i eth0 -w online.pcap tcp and port 80

b. Replay traffic

tcpcopy -x 80-10.1.x.x:80 -i traffic.pcap
tcpcopy -x 80-10.1.x.x:80 -a 2 -i traffic.pcap   // offline replay at 2× speed

c. Traffic diversion modes

tcpcopy -x 80-10.1.x.x:80 -r 20   // divert 20% of traffic
tcpcopy -x 80-10.1.x.x:80 -n 3    // amplify traffic 3×

wrk, ApacheBench, JMeter, webbench

wrk is lightweight and accurate; with Lua scripts it supports complex scenarios.

wrk -t4 -c1000 -d30s -T30s --latency http://www.example.com

Sample output shows latency distribution, requests per second, and transfer rate.

Locating Performance Bottlenecks

Consider four layers:

Application layer

System layer

JVM layer

Profiler tools

Application Layer

QPS

Response time (95th/99th percentile)

Success rate

System Layer

Key resources: CPU, memory, disk, network. A concise command to view overall status:

dstat -lcdngy

dstat provides real‑time monitoring of CPU, disk, network, I/O, and memory.

CPU

Utilization = 1 – (CPU time used by program / total runtime).

User vs. kernel time indicates compute‑intensive vs. I/O‑intensive workload.

Load average reflects the average number of processes in the run queue; ideal value ≤ number of CPU cores.

Disk

Check space and permissions; insufficient space or rights can cause failures.

du -sh          // size of current directory
df -hl          // filesystem usage

Clear large logs quickly:

sudo > /dev/null /var/log/*.log
sudo find /var/log/ -type f -mtime +30 -exec rm -f {} \

Test disk speed: dd if=/dev/zero of=output.file bs=10M count=1 Identify I/O bottlenecks with iostat, iotop, and ps:

iostat -x 1
iotop -o
ps -eo state,pid,cmd | grep '^D'

Use iostat to view %util, r/s, w/s, and identify busy disks.

Network

Common commands:

netstat -nt          // show TCP connections and queues
netstat -nap | grep port   // processes using a specific port
netstat -s          // summary, useful for detecting retransmissions

TCP state overview (client/server) and typical issues (excess SYN_SENT, large send/receive queues).

JVM Layer

Thread Stack Analysis

Capture thread stacks:

ps -ef | grep java
sudo -u nobody jstack <pid> > /tmp/jstack.<pid>

Convert native thread ID (nid) from hex to decimal to match top -H -p <pid> output.

printf "%d" 0x1b40   // decimal
printf "0x%x" 6976   // hex

High CPU Diagnosis

ps -ef | grep java          // find Java PID
top -H -p <pid>              // show hottest Java threads

Map the thread ID to the stack using the hex nid from jstack.

GC Cause Inspection

jstat -gccause <pid>

This displays garbage‑collection statistics and the reasons for the latest GC events.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Load Testingsystem performanceDisk I/OCPU analysisLinux monitoringJVM profiling
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.