Operations 13 min read

Boost Java Performance: Optimize JFR Analysis with Flame Graphs and Async‑Profiler

This article explores the evolution of continuous performance profiling, explains why traditional tracing falls short, and details a series of optimizations—including batch processing, object‑reference serialization, aggregation insertion, and multi‑chunk handling—to dramatically reduce memory usage and speed up Java Flight Recorder analysis using async‑profiler and flame graphs.

SQB Blog
SQB Blog
SQB Blog
Boost Java Performance: Optimize JFR Analysis with Flame Graphs and Async‑Profiler

Background

In 2010 Google published a seminal paper, Google‑Wide Profiling: A Continuous Profiling Infrastructure for Data Centers , introducing a low‑overhead, continuous profiling infrastructure for large‑scale services. Since then many commercial and open‑source solutions have emerged, such as Google Cloud Profiler and Pyroscope, making continuous profiling a core pillar of observability.

Why Performance Profiling?

Traditional observability pillars—tracing, metrics, and logs—are limited by predefined instrumentation and often miss bottlenecks in uninstrumented code. Trace‑based profiling (e.g., SkyWalking Trace Profiling) and sampling‑based profilers (e.g., Elastic APM) address some gaps but still cannot capture object allocation or lock contention details.

Architecture Design

JFR: Java Flight Recorder

A JFR file is a collection of Event records; each file contains many events, each describing a snapshot of the JVM.

Java: async‑profiler

async‑profiler is a low‑overhead Java sampling profiler that uses HotSpot APIs to collect stack traces and memory allocation data. It can generate JFR files or SVG flame graphs and can be integrated via a Java‑Agent.

Flame Graph

A flame graph visualizes stack‑trace samples, with the width of each block representing the frequency of that call stack. Wider blocks at the top usually indicate performance hotspots.

Performance Profiling in Practice: Optimizing JFR File Analysis

Initial Implementation: Native JFR Reader

The original approach used the JDK’s built‑in JFR module to read all events at once. While acceptable for small files, processing a 60 MB JFR file with over two million events caused high memory usage and slow analysis due to GC pressure and repeated tree insertions.

Optimization Attempt: Batch Processing

Reading events in fixed‑size batches (e.g., 1 000) reduced peak memory but did not significantly improve speed because the large file remained open throughout processing.

Optimization: Object References & Lazy Serialization

Async‑profiler stores stack traces in a dictionary and references them by stackTraceId, avoiding full serialization for each event. This reduces memory dramatically (e.g., 2.4 M events share only 30 k unique stack traces).

public class StackTrace {
    // method IDs
    public final long[] methods;
    // byte indicating method type (INTERPRETED, JIT_COMPILED, ...)
    public final byte[] types;
    // line number / bci for each method
    public final int[] locations;
    // ...
}

Optimization: Aggregated Insertion

Instead of inserting each event’s stack trace string into the tree, we cache stackTraceId with its cumulative count, then perform a single insertion per unique stack trace, reducing insert operations from millions to tens of thousands.

Optimization: Multi‑Chunk Processing

For very large JFR files we split the file into independent chunks, each processed separately. Since async‑profiler’s JfrReader did not support chunked reading, we contributed a Pull Request to add this capability.

Client vs Server Analysis

Two analysis modes are supported:

Client analysis ( HTML): simple scenarios, results rendered directly in the browser.

Server analysis ( JFR): complex scenarios, files are uploaded to the backend, stored in Elasticsearch, and visualized via a flame‑graph component.

Server‑side processing reads JFR files, builds a tree structure, stores it in Elasticsearch, and serves it to the frontend for rendering.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Javaobservabilityperformance profilingFlame Graphasync-profilerJFR
SQB Blog
Written by

SQB Blog

Thank you all.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.