Operations 21 min read

Comprehensive Guide to Diagnosing Java Production Issues: CPU, Disk, Memory, GC, and Network

This article provides a step‑by‑step troubleshooting guide for Java production incidents, covering CPU, disk, memory, GC, and network problems with practical commands, analysis techniques, and tools such as jstack, jmap, iostat, netstat, and native memory tracking.

Top Architect
Top Architect
Top Architect
Comprehensive Guide to Diagnosing Java Production Issues: CPU, Disk, Memory, GC, and Network

Online Java service failures often involve multiple layers such as CPU, disk, memory, and network, so a systematic inspection of each aspect is recommended.

CPU

Identify high‑CPU processes with ps and top -H -p <pid>, convert the PID to hexadecimal ( printf '%x\n' pid) and locate the corresponding thread in jstack output ( jstack pid | grep 'nid' -C5 --color). Analyze WAITING and TIMED_WAITING threads using

cat jstack.log | grep "java.lang.Thread.State" | sort -nr | uniq -c

.

Frequent GC

Check GC frequency with jstat -gc <pid> 1000; monitor Young/Full GC counts and times (YGC/YGT, FGC/FGCT, GCT) to decide if GC tuning is needed.

Context Switches

Use vmstat to view the cs column, or monitor a specific PID with pidstat -w <pid> (cswch/nvcswch).

Disk

Check filesystem space with df -hl and disk performance with iostat -d -k -x. Identify the responsible process using iotop, then map thread IDs to PIDs via readlink -f /proc/*/task/*/../... Inspect I/O stats with cat /proc/<pid>/io and open files with lsof -p <pid>.

Memory

Start with free to view overall memory usage. Common issues include OOM and StackOverflow:

Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread

– often caused by thread‑pool leaks; reduce Xss or increase OS limits.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

– indicates heap exhaustion; look for leaks with jstack / jmap before adjusting -Xmx. Caused by: java.lang.OutOfMemoryError: Meta space – meta‑space overflow; tune -XX:MaxMetaspaceSize. Exception in thread "main" java.lang.StackOverflowError – stack size too small; adjust -Xss.

Generate heap dumps with jmap -dump:format=b,file=heap.hprof <pid> and analyze them using MAT (Memory Analyzer Tool) or jmap -histo:live <pid>. Enable native memory tracking with -XX:NativeMemoryTracking=summary or detail and capture baselines via jcmd <pid> VM.native_memory baseline, then compare later with jcmd <pid> VM.native_memory detail.diff.

Off‑Heap Memory

Detect off‑heap growth using pmap -x <pid> | sort -rn -k3 | head -30. For suspicious regions, dump memory with

gdb --batch --pid <pid> -ex "dump memory dump.bin <addr> <addr+size>"

and inspect via hexdump -C dump.bin. Adjust -XX:MaxDirectMemorySize if needed.

GC Issues

Enable detailed GC logging with

-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps

. Analyze Young GC frequency (adjust -Xmn, -XX:SurvivorRatio) and Full GC triggers (e.g., concurrent phase failures, promotion failures, large object allocation failures, explicit System.gc()). Use jinfo -flag +HeapDumpBeforeFullGC <pid> and jinfo -flag +HeapDumpAfterFullGC <pid> to compare dumps.

Network

Network problems are complex; common categories include timeouts, TCP queue overflow, RST packets, TIME_WAIT, and CLOSE_WAIT.

Timeouts

Distinguish between connection timeout, read/write timeout, and pool‑related timeouts; ensure client timeout < server timeout.

TCP Queue Overflow

Monitor SYN and accept queues with netstat -s | egrep "listen|LISTEN" and ss -lnt. Adjust kernel parameters somaxconn, tcp_max_syn_backlog, and servlet container settings ( acceptCount for Tomcat, acceptQueueSize for Jetty).

RST Packets

RST indicates abnormal connection termination; capture with tcpdump -i en0 tcp -w capture.cap and analyze in Wireshark.

TIME_WAIT & CLOSE_WAIT

Check counts via

netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

or ss -ant. Tune kernel settings net.ipv4.tcp_tw_reuse=1 and net.ipv4.tcp_tw_recycle=1 to reduce TIME_WAIT buildup; investigate lingering CLOSE_WAIT sockets with thread dumps.

Overall, systematic use of the above commands and analysis tools helps quickly locate and resolve production‑level Java issues.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaperformancenetworkCPUMemorygc
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.