Operations 22 min read

How to Diagnose and Resolve Common Java Server Performance Issues

This guide walks through systematic troubleshooting of Java server problems—including CPU spikes, memory leaks, disk bottlenecks, GC pauses, and network anomalies—by using tools such as jstack, jmap, jstat, vmstat, iostat, netstat, and ss to pinpoint root causes and apply targeted fixes.

Efficient Ops

Dec 7, 2020

How to Diagnose and Resolve Common Java Server Performance Issues

Overview

Online incidents often involve CPU, disk, memory, and network problems; most issues span multiple layers, so a systematic four‑step investigation (CPU → Disk → Memory → Network) is recommended.

CPU

Start by checking CPU usage. CPU anomalies are usually easier to locate. Common causes include business‑logic loops, frequent GC, and excessive context switches; the most frequent cause is problematic business or framework logic, which can be examined with jstack.

Using jstack to analyze CPU problems

Find the process PID with ps (or top to see which process consumes the most CPU). Then run: top -H -p pid to identify high‑CPU threads. Convert the thread ID to hexadecimal:

printf '%x
' pid

Resulting nid is used to search the jstack output: jstack pid | grep 'nid' -C5 --color Focus on threads in WAITING or TIMED_WAITING states; BLOCKED threads are less common. For a quick overview of thread states, run:

cat jstack.log | grep "java.lang.Thread.State" | sort -nr | uniq -c

Frequent GC

Use jstat -gc pid 1000 to monitor GC generation changes (sampling interval 1000 ms). Columns S0C/S1C, S0U/S1U, EC/EU, OC/OU, MC/MU represent Survivor, Eden, Old, and Metaspace capacities and usage. YGC/YGT, FGC/FGCT, GCT show Young GC, Full GC counts and times. If GC appears too frequent, investigate further with dump analysis.

Context Switches

Inspect context switches with vmstat. The cs column shows the number of switches. To monitor a specific PID, use: pidstat -w pid Columns cswch and nvcswch indicate voluntary and involuntary switches.

Disk

Disk issues are also fundamental. Check disk space with:

df -hl

Performance problems can be diagnosed with:

iostat -d -k -x

The %util column shows disk write intensity; rrqm/s and wrqm/s indicate read/write speeds, helping locate the problematic disk. Identify the responsible process with iotop or by converting a thread ID to PID via readlink -f /proc/*/task/tid/../.., then inspect I/O with:

cat /proc/pid/io

List open files with lsof -p pid.

Memory

Memory issues are more complex and include OOM, GC problems, and off‑heap memory. Start with free to view overall memory status.

Heap Memory OOM

Typical OOM messages:

Exception in thread “main” java.lang.OutOfMemoryError: unable to create new native thread – insufficient native memory for thread stacks; check thread pools, use jstack / jmap, or increase OS limits.

Exception in thread “main” java.lang.OutOfMemoryError: Java heap space – heap reached -Xmx limit; look for leaks with jstack / jmap, then consider increasing -Xmx.

Exception in thread “main” java.lang.OutOfMemoryError: Metaspace – metaspace reached -XX:MaxMetaspaceSize; adjust with -XX:MaxPermSize for older JVMs.

Stack Overflow

Indicates thread stack exceeds -Xss. Reduce -Xss or investigate code paths.

Using JMAP to locate memory leaks

Export a heap dump:

jmap -dump:format=b,file=filename pid

Analyze the dump with MAT (Memory Analyzer Tool), focusing on “Leak Suspects” or “Top Consumers”.

GC Issues

GC problems can cause CPU load and memory pressure. Enable detailed GC logging with

-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps

. Analyze Young GC frequency with jstat; if too frequent, consider increasing -Xmn or -XX:SurvivorRatio. For long GC pauses, examine G1 log phases such as Root Scanning, Object Copy, and Ref Proc.

Full GC triggers include concurrent marking failure, promotion failure, large object allocation failure, or explicit System.gc(). Dump heap before/after Full GC with -XX:HeapDumpPath and inspect with jinfo or jmap.

jinfo -flag +HeapDumpBeforeFullGC pid
jinfo -flag +HeapDumpAfterFullGC pid

Network

Network problems are complex and often the hardest to diagnose.

Timeouts

Distinguish between connection timeout and read/write timeout. Keep client timeout smaller than server timeout to avoid hanging connections.

TCP Queue Overflow

Two queues exist: SYN (half‑open) and accept (full‑open). If the accept queue is full during the third handshake, the server may drop the ACK or send an RST depending on tcp_abort_on_overflow. Monitor overflow with:

netstat -s | egrep "listen|LISTEN"

Check queue sizes with ss -lnt and adjust OS parameters net.ipv4.tcp_tw_reuse, net.ipv4.tcp_tw_recycle, or tcp_max_tw_buckets as needed.

# enable reuse of TIME‑WAIT sockets
net.ipv4.tcp_tw_reuse = 1
# enable fast recycle of TIME‑WAIT sockets
net.ipv4.tcp_tw_recycle = 1

RST Packets

RST indicates an abnormal connection reset, often caused by sending data to a closed socket or by queue overflows. Use tcpdump and Wireshark to capture and analyze RST packets.

tcpdump -i en0 tcp -w capture.cap

TIME_WAIT and CLOSE_WAIT

TIME_WAIT ensures delayed packets are handled and prevents premature RSTs; excessive TIME_WAIT can be mitigated by enabling reuse/recycle as above. CLOSE_WAIT usually results from applications not closing sockets properly; investigate with jstack to find threads stuck in I/O or waiting on latches.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.