Cloud Native 18 min read

Detect Java Microservice Bottlenecks with ARMS Code Hotspots

During high‑traffic load tests, e‑commerce services often hit performance ceilings, leading to low success rates and high latency; by combining tracing data, CPU flame‑graphs, and Alibaba Cloud’s ARMS 3.x JavaAgent features such as Code Hotspots and Adaptive Overload Protection, teams can automatically locate bottlenecks, mitigate traffic spikes, and improve stability without code changes.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Detect Java Microservice Bottlenecks with ARMS Code Hotspots

Background

Load‑testing of a critical e‑commerce service showed the overall success rate dropping to 9.89% with high response times when traffic reached a threshold. CPU usage of the application pods approached saturation, indicating the service could not handle the high TPS.

Key Challenges

How to locate performance bottlenecks in a complex business system?

How to protect services against unpredictable traffic spikes?

Typical Open‑Source Approach

Collect tracing data with OpenTelemetry SDK/Agent.

Generate CPU flame‑graphs using Async Profiler.

Apply traffic governance with Sentinel.

These solutions require code instrumentation, additional infrastructure, and incur extra cost.

ARMS 3.x JavaAgent Features

The Alibaba Cloud JavaAgent 3.x provides two core capabilities without code changes:

Code Hotspots : automatically correlates tracing spans with CPU flame‑graphs, revealing missing instrumentation and high‑cost methods.

Adaptive Overload Protection : dynamically throttles traffic based on CPU usage thresholds, keeping the system stable during sudden load spikes.

Code Hotspots Example

Sample Java service that parses a JSON file and calls a downstream HTTP API:

public class HotSpotAction extends AbsAction {
    private RestTemplate restTemplate = new RestTemplate();

    @Override
    public void runBusiness() {
        readFile();
        invokeAPI();
    }

    private void invokeAPI() {
        String url = "https://httpbin.org/get";
        String response = restTemplate.getForObject(url, String.class);
    }

    private double readFile() {
        InputStreamReader reader = new InputStreamReader(
            ClassLoader.getSystemResourceAsStream("data/xxx.json"));
        LinkedList<Movie> movieList = GSON.fromJson(reader,
            new TypeToken<LinkedList<Movie>>() {}.getType());
        double totalCount = 0;
        for (int i = 0; i < movieList.size(); i++) {
            totalCount += movieList.get(i).rating();
        }
        return totalCount;
    }
}

Tracing shows a 2649 ms slow call chain with a 2 s blind spot between the first and second spans, which tracing alone cannot pinpoint.

After enabling the Code Hotspots feature, the UI displays a flame‑graph that highlights HotSpotAction.readFile() consuming 1.91 s , directly revealing the root cause.

Adaptive Overload Protection

In load‑test scenarios, CPU‑based auto‑scaling may still degrade success rates because new instances take seconds to become ready, causing overload on existing pods. The Microservice Engine (MSE) Adaptive Overload Protection monitors CPU usage and automatically applies a proportional rate‑limit when the threshold is reached, keeping latency low and success rates higher.

Test results show the overall success rate improving to 50.99% (≈80 % after excluding intentional rate‑limit failures) and response times quickly returning to normal once protection kicks in.

Integration with Alibaba Cloud Kubernetes (ACK)

The JavaAgent can be injected automatically via Pilot mode without modifying container images. Add the following labels to the pod spec:

armsPilotAutoEnable: "on"
armsPilotCreateAppName: "${ARMS_APP_NAME}"
msePilotAutoEnable: "on"
msePilotCreateAppName: "${MSE_APP_NAME}"
mseNamespace: "${MSE_NAMESPACE}"

After deployment, the ARMS or MSE console can be accessed directly from the ACK UI.

Additional Features in JavaAgent 3.x

Bytecode enhancement for Java 8‑21, supporting a wide range of runtimes.

Short‑connection, compressed data reporting architecture; success rate raised to 99.99 %.

Support for Vert.x, Reactor‑Netty, OceanBase, xxl‑job, PostgreSQL, Kafka, and other popular components.

CPU overhead reduced by ~50 % and startup time ≤5 s.

Compatibility with OpenTracing in the upcoming 4.x release.

Conclusion

The ARMS 3.x JavaAgent delivers a non‑intrusive, cloud‑native solution for diagnosing performance bottlenecks and protecting services against traffic surges. By leveraging Code Hotspots and Adaptive Overload Protection, teams can achieve faster root‑cause analysis and maintain high availability without modifying application code.

References:

OpenTelemetry: https://opentelemetry.io/

Async Profiler: https://github.com/async-profiler/async-profiler

Sentinel: http://sentinelguard.io/zh-cn/docs/introduction.html

Java Agent overview: https://www.developer.com/design/what-is-java-agent/

Code Hotspots guide: https://help.aliyun.com/zh/arms/application-monitoring/user-guide/use-code-hotspots-to-diagnose-code-level-problems

Performance test report: https://help.aliyun.com/zh/arms/application-monitoring/developer-reference/performance-test-report-of-arms-agent-for-java

Supported Java components and frameworks: https://help.aliyun.com/zh/arms/application-monitoring/developer-reference/java-components-and-frameworks-supported-by-arms

MSE traffic protection configuration: https://help.aliyun.com/zh/mse/user-guide/configure-web-behavior-1

ARMS monitoring overview: https://help.aliyun.com/zh/arms/application-monitoring/getting-started/overview

MSE service governance access: https://help.aliyun.com/zh/mse/user-guide/application-access-3

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud-nativeTracingoverload-protectionCPU FlameGraphperformance-monitoringjava-agent
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.