Backend Development 7 min read

How to Use Arthas Flamegraph for Java Performance Profiling and Optimization

Learn how to leverage the Arthas flamegraph tool to profile Java applications, interpret CPU usage visualizations, and apply practical optimization techniques illustrated through real-world case studies that reduced CPU consumption by up to 6% and improved system stability during high‑traffic events.

Alibaba Cloud Developer

Oct 18, 2024

How to Use Arthas Flamegraph for Java Performance Profiling and Optimization

This article shares practical experience using the Arthas flamegraph tool to analyze and optimize Java application performance.

Arthas Flamegraph Usage

Official documentation: https://arthas.aliyun.com/doc/profiler.html

Start profiling

$ profiler start
Started [cpu] profiling

Stop profiling

$ profiler stop --format flamegraph
profiler output file: /tmp/test/arthas-output/20211207-111550.html
OK

The command generates an HTML flamegraph file; refer to the official docs for additional output options.

Understanding the Flamegraph

The horizontal axis represents CPU time—wider bars indicate higher CPU consumption. The vertical axis shows call‑stack depth—taller flames indicate deeper stacks. Colors encode code origins: green for Java, yellow for JVM C++, orange for kernel‑mode C, and red for user‑mode C.

Practical Analysis (Case Studies)

Scenario: processing shipment messages with both automatic and batch modes. During peak traffic, CPU spikes above 80%.

Case 1

Large flat tops in the flamegraph highlight time‑consuming methods. The red box marks business‑logic execution; the blue box isolates a costly operation in the metaq consumer, consuming about 3‑4% CPU consistently. A data‑masking tool unexpectedly consumed up to 9.3% CPU due to heavy regex processing on large order objects. Disabling this tool reduced overall CPU usage.

Case 2

A global search operation was found to consume nearly 6% CPU under normal traffic. The overhead originated from obtaining Java call stacks during HSF calls, which proved more expensive than the HSF call itself. Refactoring the logging to use an HSF filter eliminated the stack‑capture logic.

When scanning from top to bottom the issue is less obvious; examining the main traffic entry point from bottom up helped locate the costly HSF calls spread across the system.

Optimization Results

After applying the changes, overall CPU usage dropped by about 5‑6% (from 26% to 21% user‑mode CPU). During batch operations, peak CPU stayed below 60% on 27 machines (down from 30). The improvements are expected to be more pronounced under high‑traffic events such as the upcoming Double‑Eleven stress test.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Backend Development performance profiling CPU optimization arthas flamegraph

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.