Master Java Server Troubleshooting: CPU, Memory, Disk, GC & Network Issues
This guide walks you through systematic troubleshooting of Java server incidents covering CPU, memory, disk, garbage collection, and network problems, offering step‑by‑step command‑line techniques, analysis of thread stacks, GC logs, native memory tracking, and TCP diagnostics to pinpoint root causes efficiently.
CPU
Start by checking CPU anomalies, which are often easier to locate. Common causes include business logic loops, frequent GC, and excessive context switches. Use ps to find the process ID, then top -H -p pid to identify high‑CPU threads, convert the thread ID to hex with
printf '%x
' pid, and locate the stack with jstack pid | grep 'nid' -C5 --color. Analyze WAITING and TIMED_WAITING sections in the stack trace. For a broader view, run:
cat jstack.log | grep "java.lang.Thread.State" | sort -nr | uniq -cDisk
Check disk space using df -hl. For performance issues, run iostat -d -k -x to see utilization and I/O rates. Identify the responsible process with iotop, then map a thread ID to a PID via readlink -f /proc/*/task/tid/../... Examine detailed I/O with cat /proc/pid/io and list open files using lsof -p pid.
Memory
Begin with free to assess overall memory. Common OOM errors include unable to create new native thread , Java heap space , and Meta space . Use jmap -dump:format=b,file=heap.hprof pid and analyze the dump with MAT (Memory Analyzer Tool). Enable automatic heap dumps with -XX:+HeapDumpOnOutOfMemoryError. For native memory, activate -XX:NativeMemoryTracking=summary or detail and run jcmd pid VM.native_memory baseline, later compare with jcmd pid VM.native_memory detail.diff. Track allocations via strace -f -e "brk,mmap,munmap" -p pid.
GC Issues
Use jstat -gc pid 1000 to monitor GC generations. Enable detailed GC logging with
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps. Frequent young GC may require tuning -Xmn or -XX:SurvivorRatio. Full GC triggers include concurrent phase failures, promotion failures, large object allocation failures, or explicit System.gc(). Dump heap on GC events with -XX:HeapDumpPath and inspect using jinfo.
Network
Network problems often manifest as timeouts. Distinguish between connection timeout and read/write timeout, and ensure client timeouts are shorter than server timeouts. Diagnose TCP queue overflows with netstat -s | egrep "listen|LISTEN" and ss -lnt. Monitor RST packets using tcpdump and analyze with Wireshark. Manage TIME_WAIT and CLOSE_WAIT states; reuse and recycle TIME_WAIT sockets via sysctl settings net.ipv4.tcp_tw_reuse=1 and net.ipv4.tcp_tw_recycle=1, and adjust tcp_max_tw_buckets if necessary.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
