How ARMS Continuous Profiler Enables Production‑Level Performance Analysis
This article explains the background of continuous performance profiling, demonstrates two real‑world scenarios using ARMS Continuous Profiler to locate CPU and memory hotspots, describes the tool’s design and core components, and shows how the fixes improve application responsiveness and resource usage.
Introduction
Traditional performance troubleshooting relied on logs and manual instrumentation, which are invasive and often miss critical information. Early profiling tools incurred high overhead and could not run continuously in production, making issue replication difficult.
ARMS Continuous Profiler Overview
Developed jointly by Alibaba Cloud ARMS and the Dragonwell team, ARMS Continuous Profiler brings mature profiling techniques to production environments. It adds a time dimension to standard profiling and supports three main steps: data collection in production, storing profiling files, and visualizing the results.
Locate performance problems at any moment (e.g., high CPU or memory usage).
Compare two time periods to see performance evolution.
Inspect call stacks for deeper code understanding.
Scenario 1: CPU Hotspot Analysis
Problem : A library service’s Java process consumes excessive CPU, causing response times of over ten seconds.
Steps :
Select CPU Time profiling type.
Navigate: ARMS Console → Application Home → Application Diagnosis → CPU & Memory Diagnosis.
Examine the flame graph.
The flame graph shows java.util.LinkedList.node(int) consuming 85% of CPU, triggered by DemoController.countAllBookPages(List), which traverses large collections inefficiently.
Fixes :
Replace LinkedList with ArrayList for random access.
Change the loop to an enhanced for loop.
Verification : After redeploying, load tests show a dramatic drop in response time and CPU utilization.
Scenario 2: Memory Allocation Hotspot
Problem : The same service exhibits high CPU due to frequent GC, indicating memory pressure.
Steps :
Again select CPU Time to confirm CPU hotspot.
Switch to memory allocation profiling.
The memory flame graph reveals that DemoController.queryAllBooks accounts for 99% of allocations, creating 20,000 large objects stored in a List.
Fix : Implement proper pagination in the database query instead of loading all records into memory.
Verification : Post‑fix load tests confirm reduced response latency and lower CPU usage.
Design and Implementation
The profiler consists of three parts:
Data Collection : Uses Java Flight Recorder (JFR) or async‑profiler, automatically chosen based on the Java version, to sample the application without affecting safety points.
Storage and Analysis : JFR Analyzer reads JFR files, parses, aggregates, and produces intermediate results for querying.
Visualization : Results are displayed as tables or flame graphs with comparison capabilities.
Key Technologies :
Java Flight Recorder – low‑overhead profiling built into OpenJDK.
async‑profiler – C++‑based profiler that can generate JFR‑compatible files, used when JFR is unavailable or too costly.
JFR File Analyzer – converts JFR files into a time‑range‑queryable tree structure supporting multiple profiling dimensions.
Conclusion
ARMS Continuous Profiler provides a production‑ready, low‑overhead solution for continuous performance analysis, enabling developers to pinpoint CPU and memory hotspots, apply targeted optimizations, and verify improvements through repeatable load testing.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
