Operations 7 min read

Diagnosing Full GC and High CPU Issues in Java Services with Arthas, async‑profiler, VisualVM and GCEasy

This article demonstrates how to quickly locate and resolve frequent full GC and CPU spikes in Java backend services by combining Arthas‑integrated async‑profiler flame graphs with VisualVM, GCEasy analysis, and practical step‑by‑step deployment procedures.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Diagnosing Full GC and High CPU Issues in Java Services with Arthas, async‑profiler, VisualVM and GCEasy

In daily operations, Java backend services often encounter full GC, CPU spikes, and memory pressure; rapid resolution requires pinpointing the problematic code. This article introduces a practical workflow using Alibaba's Arthas (which embeds async‑profiler flame‑graph), VisualVM, and GCEasy to diagnose such issues.

1. Background

In an order‑domain task system, both master and slave machines suffered frequent full GC and intermittent CPU spikes, with young GC occurring about ten times per minute and thread counts rising to ~1500. The application runs on a CMS collector with a 4 CPU 8 GB heap (4 GB allocated), while heap and non‑heap memory appear normal.

Images illustrate the GC frequency increase (from every 20 minutes to every 3‑5 minutes) and CPU usage exceeding 75% on some nodes.

2. Tool Selection and Practice

Four tools were evaluated: Arthas + async‑profiler flame graphs, VisualVM (cross‑time heap dump comparison), GCEasy, and traditional GC logs. Each was applied to the problematic environment.

2.1 Arthas analysis

Flame‑graph inspection revealed that deserialization of product‑domain channel configuration objects on the master node consumed significant CPU.

2.2 VisualVM analysis Two heap dump files (spanning a day) were compared. The largest memory consumers were ES query results; deserialized objects were not the biggest but were still identifiable. Differences arise because Arthas focuses on CPU/thread usage while heap dumps reflect memory. Commands used: cd /Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home/bin jvisualvm VisualVM configuration (visualvm.conf) was adjusted to increase memory allocation. 2.3 GCEasy analysis JVM was started with -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:./gc.log to capture GC logs. GCEasy online comparison showed no memory leak but did not pinpoint the GC cause. 2.4 async‑profiler flame‑graph Flame‑graph results highlighted two major hotspots: the reverse‑lookup task on the master/slave node and the ES query deserialization, matching Arthas findings. Downloading the memory flame‑graph confirmed these hotspots and allowed cross‑validation with VisualVM dumps. 3. Fix and Release Based on the analysis, code modifications were made to address the identified bottlenecks, and the updated version was deployed. Post‑deployment monitoring showed stable system behavior. 4. Usage Steps 1) Request a bastion host with root access. 2) Download the latest Arthas boot JAR: wget https://alibaba.github.io/arthas/arthas-boot.jar . 3) Install with admin rights: java -jar arthas-boot.jar . 4) Use commands to view CPU usage, dashboard (q to quit), and thread information. 5) Start profiling: profiler start to begin sampling. 6) Generate flame‑graph: profiler stop --format html and download the result. 7) Explore multiple dimensions (lock, allocation, CPU) via the generated HTML flame‑graph. 8) Additional features include decompiling JAR code and measuring method execution times. The article concludes that integrating Arthas with async‑profiler provides the most intuitive and efficient way to locate performance problems, and encourages colleagues to explore these tools for similar scenarios.

JavaGCArthasCPU ProfilingPerformance Debuggingasync-profilerVisualVM
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.