Operations 21 min read

Comprehensive Guide to Java Runtime Error Checking: CPU, Disk, Memory, GC, and Network Troubleshooting

This article provides a step‑by‑step guide for diagnosing Java production issues by systematically checking CPU usage, disk health, memory consumption, garbage‑collection behavior, and network problems using common Linux tools and JVM utilities such as ps, top, jstack, jstat, vmstat, iostat, free, jmap, and tcpdump.

Top Architect
Top Architect
Top Architect
Comprehensive Guide to Java Runtime Error Checking: CPU, Disk, Memory, GC, and Network Troubleshooting

Online Java service failures often involve CPU, disk, memory, and network layers; most incidents contain multiple symptoms, so a systematic four‑step investigation is recommended.

CPU

Start by locating the problematic process with ps or top, then identify high‑CPU threads using top -H -p <pid>. Convert the thread ID to hexadecimal and search it in a jstack dump.

printf '%x
' <pid>
jstack <pid> | grep 'nid' -C5 --color

Analyze WAITING/TIMED_WAITING threads with:

cat jstack.log | grep "java.lang.Thread.State" | sort -nr | uniq -c

Frequent GC

Monitor GC activity with jstat -gc <pid> 1000, observing survivor, eden, old‑gen, and metaspace usage as well as YGC/FGC counts and times.

Context Switch

Use vmstat to view the cs column (context switches). For per‑process details, run pidstat -w <pid> to see voluntary and involuntary switches.

Disk

Check disk space with df -hl. For performance issues, run iostat -d -k -x and examine the %util, rrqm/s, and wrqm/s columns to pinpoint overloaded devices.

Identify the process performing I/O using iotop, then translate a thread ID to a PID: readlink -f /proc/*/task/<tid>/../.. Inspect the process’s I/O counters: cat /proc/<pid>/io Determine open files with lsof -p <pid>.

Memory

Begin with free to view overall memory status. For OOM and stack‑overflow problems, use jmap -dump:format=b,file=heap.hprof <pid> and analyze the dump with MAT (Memory Analyzer Tool). jmap -dump:format=b,file=heap.hprof <pid> Typical OOM messages include:

Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Exception in thread "main" java.lang.OutOfMemoryError: Meta space

Enable automatic heap dumps with -XX:+HeapDumpOnOutOfMemoryError. For off‑heap leaks, track native memory using -XX:NativeMemoryTracking=summary (or detail) and query with jcmd <pid> VM.native_memory.

GC Issues

Activate detailed GC logging:

-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps

Analyze young‑generation GC frequency and duration; if too high, consider adjusting -Xmn, -XX:SurvivorRatio, or using MAT on heap dumps. For full GCs, check for concurrent‑mark failures, promotion failures, or large‑object allocation issues, and tune parameters such as -XX:G1ReservePercent, -XX:InitiatingHeapOccupancyPercent, or -XX:G1HeapRegionSize. Dump heap before/after full GC with:

jinfo -flag +HeapDumpBeforeFullGC <pid>
 jinfo -flag +HeapDumpAfterFullGC <pid>

Network

Network‑related faults are complex. Distinguish connection timeout, read/write timeout, and connection‑pool timeouts. Keep client‑side timeouts shorter than server‑side values.

TCP Queue Overflow

Two queues exist: SYN (half‑open) and accept (full‑open). When the accept queue is full, the kernel may send RST packets. Inspect overflow counters with: netstat -s | egrep "listen|LISTEN" Check current queue lengths with ss -lnt. Adjust backlog (acceptCount in Tomcat, acceptQueueSize in Jetty) and OS parameters somaxconn and tcp_max_syn_backlog.

RST Packets

RST indicates abnormal connection termination. Capture packets with tcpdump -i en0 tcp -w capture.cap and analyze in Wireshark.

TIME_WAIT and CLOSE_WAIT

View counts with:

netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'
ss -ant | awk '{++S[$1]} END {for(a in S) print a, S[a]}'

Reduce excessive TIME_WAIT by enabling net.ipv4.tcp_tw_reuse=1 and net.ipv4.tcp_tw_recycle=1. For CLOSE_WAIT, investigate blocked threads (often stuck in I/O or latch.await) using jstack.

Overall, combine system‑level commands (ps, top, vmstat, iostat, netstat, ss, tcpdump) with JVM tools (jstack, jmap, jstat, jcmd, MAT) to pinpoint the root cause of Java runtime anomalies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaGarbage CollectionPerformance MonitoringCPUMemory
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.