How to Use Arthas Flamegraph for Java Performance Profiling and Optimization

Learn how to leverage the Arthas flamegraph tool to profile Java applications, interpret CPU usage visualizations, and apply practical optimization techniques illustrated through real-world case studies that reduced CPU consumption by up to 6% and improved system stability during high‑traffic events.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How to Use Arthas Flamegraph for Java Performance Profiling and Optimization

This article shares practical experience using the Arthas flamegraph tool to analyze and optimize Java application performance.

Arthas Flamegraph Usage

Official documentation: https://arthas.aliyun.com/doc/profiler.html

Start profiling

$ profiler start
Started [cpu] profiling

Stop profiling

$ profiler stop --format flamegraph
profiler output file: /tmp/test/arthas-output/20211207-111550.html
OK

The command generates an HTML flamegraph file; refer to the official docs for additional output options.

Understanding the Flamegraph

The horizontal axis represents CPU time—wider bars indicate higher CPU consumption. The vertical axis shows call‑stack depth—taller flames indicate deeper stacks. Colors encode code origins: green for Java, yellow for JVM C++, orange for kernel‑mode C, and red for user‑mode C.

Flamegraph example
Flamegraph example

Practical Analysis (Case Studies)

Scenario: processing shipment messages with both automatic and batch modes. During peak traffic, CPU spikes above 80%.

Case 1

Case 1 flamegraph
Case 1 flamegraph

Large flat tops in the flamegraph highlight time‑consuming methods. The red box marks business‑logic execution; the blue box isolates a costly operation in the metaq consumer, consuming about 3‑4% CPU consistently. A data‑masking tool unexpectedly consumed up to 9.3% CPU due to heavy regex processing on large order objects. Disabling this tool reduced overall CPU usage.

Case 2

Case 2 flamegraph part 1
Case 2 flamegraph part 1
Case 2 flamegraph part 2
Case 2 flamegraph part 2
Case 2 flamegraph part 3
Case 2 flamegraph part 3

A global search operation was found to consume nearly 6% CPU under normal traffic. The overhead originated from obtaining Java call stacks during HSF calls, which proved more expensive than the HSF call itself. Refactoring the logging to use an HSF filter eliminated the stack‑capture logic.

When scanning from top to bottom the issue is less obvious; examining the main traffic entry point from bottom up helped locate the costly HSF calls spread across the system.

HSF call distribution
HSF call distribution

Optimization Results

After applying the changes, overall CPU usage dropped by about 5‑6% (from 26% to 21% user‑mode CPU). During batch operations, peak CPU stayed below 60% on 27 machines (down from 30). The improvements are expected to be more pronounced under high‑traffic events such as the upcoming Double‑Eleven stress test.

Optimization effect
Optimization effect
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaBackend Developmentperformance profilingCPU optimizationArthasflamegraph
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.