Operations 20 min read

Comprehensive Guide to Troubleshooting CPU, Disk, Memory, GC, and Network Issues in Java Applications

This article provides a step‑by‑step methodology for diagnosing and resolving common online failures in Java services, covering CPU bottlenecks, disk I/O problems, memory leaks, garbage‑collection inefficiencies, and network anomalies such as timeouts, TCP queue overflows, and RST packets.

Top Architect
Top Architect
Top Architect
Comprehensive Guide to Troubleshooting CPU, Disk, Memory, GC, and Network Issues in Java Applications

Online service failures often involve CPU, disk, memory, and network problems; most incidents span multiple layers, so a systematic four‑step inspection—CPU → Disk → Memory → Network—is recommended, using tools like jstack, jmap, jstat, top, vmstat, iostat, netstat, and ss.

CPU : Identify high‑CPU threads with ps to get the PID, then top -H -p <pid> to find hot threads, convert the thread ID to hexadecimal ( printf '%x\n' <pid>), and locate the stack in a jstack dump using jstack <pid> | grep '<nid>' -C5 --color. Focus on WAITING/TIMED_WAITING states and use

cat jstack.log | grep "java.lang.Thread.State" | sort -nr | uniq -c

to spot problematic threads.

Disk : Check filesystem space with df -hl and monitor I/O performance using iostat -d -k -x. Identify the busiest disks via the %util column and pinpoint the responsible process with iotop. Convert a thread ID to a PID using readlink -f /proc/*/task/<tid>/../.. and inspect its I/O via cat /proc/<pid>/io or lsof -p <pid>.

Memory : Start with free to view overall usage, then differentiate between heap OOM ( java.lang.OutOfMemoryError: Java heap space), native thread stack OOM, and metaspace OOM. Use jmap -histo:live <pid> and Eclipse MAT ( jmap -dump:format=b,file=heap.hprof <pid>) to locate leaks. For off‑heap leaks, monitor native memory with pmap -x <pid> | sort -rn -k3 | head -30, capture dumps via

gdb --batch --pid <pid> -ex "dump memory dump.bin 0x<addr> 0x<addr+size>"

, and analyze with jcmd <pid> VM.native_memory. Enable -XX:+HeapDumpOnOutOfMemoryError for automatic dumps.

GC : Enable detailed GC logging with

-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps

. Use jstat -gc <pid> 1000 to observe Young vs. Full GC frequency. For G1, adjust -Xmn, -XX:SurvivorRatio, -XX:G1ReservePercent, and -XX:InitiatingHeapOccupancyPercent to mitigate frequent or long‑running collections. Dump heap before/after Full GC with jinfo -flag +HeapDumpBeforeFullGC <pid> and jinfo -flag +HeapDumpAfterFullGC <pid>.

Network : Diagnose timeouts, TCP queue overflows, and RST packets. Use netstat -s | egrep "listen|LISTEN" to view overflow counters, ss -lnt to check backlog sizes, and adjust kernel parameters ( net.ipv4.tcp_tw_reuse=1, net.ipv4.tcp_tw_recycle=1) to recycle TIME_WAIT sockets. Monitor connection states with

netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

or ss -ant. Capture RST traffic with tcpdump -i en0 tcp -w capture.cap and analyze in Wireshark.

By following these systematic checks and leveraging the listed commands, developers can quickly pinpoint the root cause of performance degradations and restore service stability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaperformanceOperationsnetworktroubleshootingCPUMemorygc
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.