Java Performance Tuning in Practice (Part 6): End-to-End Case Study from Full GC to 200 ms Response
The article walks through a real e‑commerce order service where afternoon spikes caused response times to jump to 2‑3 seconds, Full GC to run every few minutes, and Metaspace to hit 99.9%, then demonstrates step‑by‑step analysis using monitoring, GC logs, jstat, jmap, and heap dumps, fixes a custom class‑loader leak, adjusts JVM flags, and achieves sub‑200 ms latency with no Full GC.
Case Background
An e‑commerce order service (four instances, each 4 CPU × 8 GB) sits behind a gateway and uses MySQL and Redis.
Problem Symptoms
Afternoon peak (3‑5 pm) latency rises from ~50 ms to 2‑3 seconds.
Some requests time out, increasing user complaints.
Monitoring alerts show CPU > 80 % and frequent Full GC.
Step 1 – Monitoring Signals
Key metrics compared with normal thresholds:
P99 response time: normal < 100 ms, observed 2.3 s.
GC pause: normal < 50 ms, observed 800 ms‑2 s.
Full GC frequency: normal < 1 time/day, observed 5‑8 times/hour.
Old‑generation usage: normal < 70 %, observed > 95 %.
Step 2 – GC Log Inspection
# 查看 GC 日志
cat gc.log | grep "Full GC"
2025-06-15T14:30:01.123+0800: 1234.567: [Full GC (Metadata GC Threshold)
2025-06-15T14:30:01.123+0800: 1234.567: [CMS: 8192K->8192K(20480K), 0.8567 secs]
15360K->8192K(26368K), [Metaspace: 1024K->1024K(2048K)], 0.8570 secs]
[Times: user=0.85 sys=0.01, real=0.86 secs]Key observations:
Full GC triggered by Metadata GC Threshold (Metaspace shortage).
Each Full GC takes 0.85‑0.91 s.
Heap memory is reclaimed normally, but Metaspace stays at 1024 KB.
Step 3 – jstat Verification
jstat -gcutil 12345 1000Sample output shows Metaspace (M) usage at 99.9 % while the old generation (O) is only ~12 %.
Step 4 – Class‑Loader Leak Detection
jmap -clstats 12345Output snippet:
class_loader classes bytes parent_loader alive? type
0x00000007f3a4c00 1234 2,456,789 ... true sun.misc.Launcher$AppClassLoader
...Many class_loader entries are marked alive? false, indicating they were not reclaimed; this is a classic class‑loader leak.
Root Cause
The application creates a custom class loader to load configuration classes dynamically, but the loader is never closed after the task finishes, preventing the loaded classes from being garbage‑collected and causing Metaspace to fill up, which in turn triggers frequent Full GC.
Step 5 – Heap Dump Confirmation
# Export heap dump
jmap -dump:format=b,file=heap.hprof 12345
Problem Suspect 1:
The class "com.example.DynamicConfig" is loaded by
"com.example.CustomClassLoader@0x7f3a4c00" which is not GC‑eligible.
It occupies 245 MB (78.5 %) of the heap.The heap analysis (MAT) confirms the massive number of DynamicConfig instances held by the non‑GC‑eligible custom class loader.
Solution
Code Fix
// ❌ Problematic code
public class ConfigLoader {
public void loadConfig(String path) {
URLClassLoader loader = new URLClassLoader(new URL[]{new URL(path)});
// load config classes …
// loader never closed
}
}
// ✅ Fixed code
public class ConfigLoader {
public void loadConfig(String path) {
try (URLClassLoader loader = new URLClassLoader(new URL[]{new URL(path)})) {
// load config classes …
} catch (IOException e) {
// handle exception
}
}
}JVM Parameter Adjustments
# Before tuning
-Xms4g -Xmx4g -XX:+UseConcMarkSweepGC
# After tuning
-Xms8g -Xmx8g # increase heap size
-XX:+UseG1GC # switch to G1
-XX:MaxGCPauseMillis=100 # target pause ≤ 100 ms
-XX:MetaspaceSize=256m # initial Metaspace
-XX:MaxMetaspaceSize=512m # max Metaspace with headroom
-XX:ConcGCThreads=4 # 4 concurrent GC threadsResulting Improvements
Full GC frequency: 5‑8 times/hour → 0 times.
P99 response time: 2.3 s → 186 ms (≈ 92 % reduction).
CPU usage: > 80 % → 45‑55 %.
Metaspace usage: 99.9 % → 45‑60 %.
Full Tuning Process Recap
1. Monitoring alert → problem discovered
2. GC log → frequent Full GC identified
3. jstat → Metaspace near full
4. jmap -clstats → class‑loader leak detected
5. MAT heap dump → pinpoint custom class loader
6. Fix code + adjust JVM flags
7. Gradual rollout → full production deploymentThe series concludes with a summary of the six parts covering JVM memory, CPU spikes, memory leaks, thread‑pool tuning, and GC optimization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Coder Trainee
Experienced in Java and Python, we share and learn together. For submissions or collaborations, DM us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
