How to Use Arthas Flamegraph for Java Performance Profiling and Optimization
Learn how to leverage the Arthas flamegraph tool to profile Java applications, interpret CPU usage visualizations, and apply practical optimization techniques illustrated through real-world case studies that reduced CPU consumption by up to 6% and improved system stability during high‑traffic events.
This article shares practical experience using the Arthas flamegraph tool to analyze and optimize Java application performance.
Arthas Flamegraph Usage
Official documentation: https://arthas.aliyun.com/doc/profiler.html
Start profiling
$ profiler start
Started [cpu] profilingStop profiling
$ profiler stop --format flamegraph
profiler output file: /tmp/test/arthas-output/20211207-111550.html
OKThe command generates an HTML flamegraph file; refer to the official docs for additional output options.
Understanding the Flamegraph
The horizontal axis represents CPU time—wider bars indicate higher CPU consumption. The vertical axis shows call‑stack depth—taller flames indicate deeper stacks. Colors encode code origins: green for Java, yellow for JVM C++, orange for kernel‑mode C, and red for user‑mode C.
Practical Analysis (Case Studies)
Scenario: processing shipment messages with both automatic and batch modes. During peak traffic, CPU spikes above 80%.
Case 1
Large flat tops in the flamegraph highlight time‑consuming methods. The red box marks business‑logic execution; the blue box isolates a costly operation in the metaq consumer, consuming about 3‑4% CPU consistently. A data‑masking tool unexpectedly consumed up to 9.3% CPU due to heavy regex processing on large order objects. Disabling this tool reduced overall CPU usage.
Case 2
A global search operation was found to consume nearly 6% CPU under normal traffic. The overhead originated from obtaining Java call stacks during HSF calls, which proved more expensive than the HSF call itself. Refactoring the logging to use an HSF filter eliminated the stack‑capture logic.
When scanning from top to bottom the issue is less obvious; examining the main traffic entry point from bottom up helped locate the costly HSF calls spread across the system.
Optimization Results
After applying the changes, overall CPU usage dropped by about 5‑6% (from 26% to 21% user‑mode CPU). During batch operations, peak CPU stayed below 60% on 27 machines (down from 30). The improvements are expected to be more pronounced under high‑traffic events such as the upcoming Double‑Eleven stress test.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
