Backend Development 30 min read

Implementation Principles of JVM CPU Profiler and Dynamic Attach Mechanism

The article explains how JVM CPU profilers work by using Java or native agents—via JVMTI, JMX, or the AsyncGetCallTrace hack—to sample or instrument stack traces, generate flame graphs, and employ the HotSpot Attach API for live, zero‑restart agent loading and diagnostics.

Meituan Technology Team

Oct 10, 2019

Implementation Principles of JVM CPU Profiler and Dynamic Attach Mechanism

When developers encounter alerts or need to optimize system performance, they often have to analyze program execution behavior and performance bottlenecks. Profiling is a dynamic analysis technique that collects runtime information; CPU profiling is the most widely used type.

Various JVM profilers exist, such as the commercial JProfiler, the open‑source JVM‑Profiler, and the built‑in profiler of IntelliJ IDEA. In IDEA you can add a CPU Profiler in the Preferences → Build, Execution, Deployment → Java Profiler page, run the application with “Run with Profiler”, and after a recommended 5 minutes you can view the results, including flame graphs and call‑tree visualizations.

Flame graphs are generated from sampled call‑stack data; the flat “top” of the graph indicates the hottest methods. The call‑tree provides another visual way to explore the same sample set.

A JVM Agent is a special library loaded at JVM start‑up via -agentlib, -agentpath or -javaagent. It runs in the same process as the target JVM and can access JVM internals through either the native JVMTI interface (C/C++/Rust) or the Java Instrument API.

<code style="line-height: 18px; font-size: 14px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(169, 183, 198); padding: 0.5em; display: -webkit-box !important">    -agentlib:<library_name>[=<options>]
    -agentpath:<full_path>[=<options>]
    -javaagent:<jar_path>[=<options>]
</code>

JVMTI (JVM Tool Interface) provides a standard C/C++ programming interface for building debuggers, profilers, monitors, and thread analysers. A native agent typically implements the entry function Agent_OnLoad:

<code style="line-height: 18px; font-size: 14px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(169, 183, 198); padding: 0.5em; display: -webkit-box !important">JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *vm, char *options, void *reserved);
</code>

A Java Agent is written in Java and declared in the JAR’s META‑INF/MANIFEST.MF with Premain-Class (or Agent-Class for dynamic attach). The entry method looks like:

<code style="line-height: 18px; font-size: 14px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(169, 183, 198); padding: 0.5em; display: -webkit-box !important">public static void premain(String args, Instrumentation ins) {
    // implement
}
</code>

CPU profilers can be built using two main techniques:

Sampling : a background thread periodically (e.g., every few milliseconds) dumps the stack traces of all threads via JMX or JVMTI. This approach has low overhead but can miss short‑lived hot methods because sampling only occurs at safe points.

Instrumentation : bytecode is instrumented to insert probes at method entry/exit, counting invocations and measuring elapsed time. This yields exact statistics but adds noticeable runtime overhead.

Implementation example – Java Agent + JMX (sampling):

<code style="line-height: 18px; font-size: 14px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(169, 183, 198); padding: 0.5em; display: -webkit-box !important">ThreadInfo[] threadInfos = threadMXBean.dumpAllThreads(false, false);
// iterate ThreadInfo, extract StackTraceElement[] and aggregate samples
</code>

Implementation example – JVMTI + GetStackTrace (native sampling):

<code style="line-height: 18px; font-size: 14px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(169, 183, 198); padding: 0.5em; display: -webkit-box !important">jvmtiError GetAllThreads(jvmtiEnv *env, jint *threads_count_ptr, jthread **threads_ptr);
jvmtiError GetThreadInfo(jvmtiEnv *env, jthread thread, jvmtiThreadInfo *info_ptr);
jvmtiError GetStackTrace(jvmtiEnv *env, jthread thread, jint start_depth, jint max_frame_count, jvmtiFrameInfo *frame_buffer, jint *count_ptr);
</code>

Safe‑point bias is a fundamental limitation of sampling profilers: because GetStackTrace (and JMX) can only capture a thread when it is at a safe point, some hot code paths may never be sampled, leading to inaccurate hot‑spot detection.

AsyncGetCallTrace (AGCT) is a non‑standard JVMTI function that can capture a stack trace asynchronously, even when the target thread is not at a safe point. Its prototype is:

<code style="line-height: 18px; font-size: 14px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(169, 183, 198); padding: 0.5em; display: -webkit-box !important">typedef struct { jint lineno; jmethodID method_id; } AGCT_CallFrame;
typedef struct { JNIEnv *env; jint num_frames; AGCT_CallFrame *frames; } AGCT_CallTrace;
void AsyncGetCallTrace(AGCT_CallTrace *trace, jint depth, void *ucontext);
</code>

Using AGCT, a profiler registers a SIGPROF handler, sends the signal to random threads, and calls AsyncGetCallTrace inside the handler, thus achieving near‑zero overhead sampling. Open‑source projects such as Async‑Profiler and Honest‑Profiler adopt this technique.

After collecting stack‑trace samples, the data can be converted to the FlameGraph text format (semicolon‑separated method list followed by a sample count) and fed to Brendan Gregg’s flamegraph.pl script to generate an SVG flame graph.

<code style="line-height: 18px; font-size: 14px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(169, 183, 198); padding: 0.5em; display: -webkit-box !important">base_func;func1;func2;func3 10
base_func;funca;funcb 15
</code>

Dynamic Attach Mechanism: HotSpot provides an Attach API that allows a tool to load an agent into a running JVM without restarting it. On Linux the process works as follows:

Create a file /tmp/.attach_pid<pid> (or in the process’s current working directory).

Send SIGQUIT to the target JVM process.

The JVM’s SIGQUIT handler detects the .attach_pid file and starts an attach‑listener thread that opens a Unix domain socket at /tmp/.java_pid<pid>.

The external tool connects to this socket, sends a simple protocol (protocol version, command, up to three arguments, each NUL‑terminated), and receives the response.

<code style="line-height: 18px; font-size: 14px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(169, 183, 198); padding: 0.5em; display: -webkit-box !important">// Example of creating the trigger and sending SIGQUIT
char path[PATH_MAX];
snprintf(path, sizeof(path), "/tmp/.attach_pid%d", pid);
int fd = creat(path, 0660);
close(fd);
kill(pid, SIGQUIT);
</code>

After the socket is ready, the tool writes the command (e.g., load to load a native JVMTI agent) and reads the response. The protocol looks like:

<code style="line-height: 18px; font-size: 14px; font-family: Consolas, Inconsolata, Courier, monospace; color: rgb(169, 183, 198); padding: 0.5em; display: -webkit-box !important">1\0load\0<absolute_path_to_agent.so\0true\0\0\0
</code>

Supported attach commands include load, threaddump, dumpheap, properties, etc., allowing tools such as Arthas to perform live diagnostics.

In summary, understanding the underlying mechanisms of profilers—whether they rely on JVM agents, JVMTI, JMX, or the AsyncGetCallTrace hack—helps developers choose the right tool for a given scenario and avoid common pitfalls such as safe‑point bias or excessive instrumentation overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

JVM performance profiling Java Agent JVMTI CPU Profiler

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.