Operations 22 min read

How to Diagnose and Resolve Common Java Server Performance Issues

This guide walks through systematic troubleshooting of Java server problems—including CPU spikes, memory leaks, disk bottlenecks, GC pauses, and network anomalies—by using tools such as jstack, jmap, jstat, vmstat, iostat, netstat, and ss to pinpoint root causes and apply targeted fixes.

Efficient Ops
Efficient Ops
Efficient Ops
How to Diagnose and Resolve Common Java Server Performance Issues

Overview

Online incidents often involve CPU, disk, memory, and network problems; most issues span multiple layers, so a systematic four‑step investigation (CPU → Disk → Memory → Network) is recommended.

CPU

Start by checking CPU usage. CPU anomalies are usually easier to locate. Common causes include business‑logic loops, frequent GC, and excessive context switches; the most frequent cause is problematic business or framework logic, which can be examined with

jstack

.

Using jstack to analyze CPU problems

Find the process PID with

ps

(or

top

to see which process consumes the most CPU). Then run:

top -H -p pid

to identify high‑CPU threads. Convert the thread ID to hexadecimal:

printf '%x
' pid

Resulting

nid

is used to search the jstack output:

jstack pid | grep 'nid' -C5 --color

Focus on threads in WAITING or TIMED_WAITING states; BLOCKED threads are less common. For a quick overview of thread states, run:

cat jstack.log | grep "java.lang.Thread.State" | sort -nr | uniq -c

Frequent GC

Use

jstat -gc pid 1000

to monitor GC generation changes (sampling interval 1000 ms). Columns S0C/S1C, S0U/S1U, EC/EU, OC/OU, MC/MU represent Survivor, Eden, Old, and Metaspace capacities and usage. YGC/YGT, FGC/FGCT, GCT show Young GC, Full GC counts and times. If GC appears too frequent, investigate further with dump analysis.

Context Switches

Inspect context switches with

vmstat

. The

cs

column shows the number of switches. To monitor a specific PID, use:

pidstat -w pid

Columns

cswch

and

nvcswch

indicate voluntary and involuntary switches.

Disk

Disk issues are also fundamental. Check disk space with:

df -hl

Performance problems can be diagnosed with:

iostat -d -k -x

The

%util

column shows disk write intensity;

rrqm/s

and

wrqm/s

indicate read/write speeds, helping locate the problematic disk. Identify the responsible process with

iotop

or by converting a thread ID to PID via

readlink -f /proc/*/task/tid/../..

, then inspect I/O with:

cat /proc/pid/io

List open files with

lsof -p pid

.

Memory

Memory issues are more complex and include OOM, GC problems, and off‑heap memory. Start with

free

to view overall memory status.

Heap Memory OOM

Typical OOM messages:

Exception in thread “main” java.lang.OutOfMemoryError: unable to create new native thread – insufficient native memory for thread stacks; check thread pools, use

jstack

/

jmap

, or increase OS limits.

Exception in thread “main” java.lang.OutOfMemoryError: Java heap space – heap reached

-Xmx

limit; look for leaks with

jstack

/

jmap

, then consider increasing

-Xmx

.

Exception in thread “main” java.lang.OutOfMemoryError: Metaspace – metaspace reached

-XX:MaxMetaspaceSize

; adjust with

-XX:MaxPermSize

for older JVMs.

Stack Overflow

Indicates thread stack exceeds

-Xss

. Reduce

-Xss

or investigate code paths.

Using JMAP to locate memory leaks

Export a heap dump:

jmap -dump:format=b,file=filename pid

Analyze the dump with MAT (Memory Analyzer Tool), focusing on “Leak Suspects” or “Top Consumers”.

GC Issues

GC problems can cause CPU load and memory pressure. Enable detailed GC logging with

-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps

. Analyze Young GC frequency with

jstat

; if too frequent, consider increasing

-Xmn

or

-XX:SurvivorRatio

. For long GC pauses, examine G1 log phases such as Root Scanning, Object Copy, and Ref Proc.

Full GC triggers include concurrent marking failure, promotion failure, large object allocation failure, or explicit

System.gc()

. Dump heap before/after Full GC with

-XX:HeapDumpPath

and inspect with

jinfo

or

jmap

.

jinfo -flag +HeapDumpBeforeFullGC pid
jinfo -flag +HeapDumpAfterFullGC pid

Network

Network problems are complex and often the hardest to diagnose.

Timeouts

Distinguish between connection timeout and read/write timeout. Keep client timeout smaller than server timeout to avoid hanging connections.

TCP Queue Overflow

Two queues exist: SYN (half‑open) and accept (full‑open). If the accept queue is full during the third handshake, the server may drop the ACK or send an RST depending on

tcp_abort_on_overflow

. Monitor overflow with:

netstat -s | egrep "listen|LISTEN"

Check queue sizes with

ss -lnt

and adjust OS parameters

net.ipv4.tcp_tw_reuse

,

net.ipv4.tcp_tw_recycle

, or

tcp_max_tw_buckets

as needed.

# enable reuse of TIME‑WAIT sockets
net.ipv4.tcp_tw_reuse = 1
# enable fast recycle of TIME‑WAIT sockets
net.ipv4.tcp_tw_recycle = 1

RST Packets

RST indicates an abnormal connection reset, often caused by sending data to a closed socket or by queue overflows. Use

tcpdump

and Wireshark to capture and analyze RST packets.

tcpdump -i en0 tcp -w capture.cap

TIME_WAIT and CLOSE_WAIT

TIME_WAIT ensures delayed packets are handled and prevents premature RSTs; excessive TIME_WAIT can be mitigated by enabling reuse/recycle as above. CLOSE_WAIT usually results from applications not closing sockets properly; investigate with

jstack

to find threads stuck in I/O or waiting on latches.

javaMonitoringperformanceOperationsTroubleshooting
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.