Detect Java Microservice Bottlenecks with ARMS Code Hotspots
During high‑traffic load tests, e‑commerce services often hit performance ceilings, leading to low success rates and high latency; by combining tracing data, CPU flame‑graphs, and Alibaba Cloud’s ARMS 3.x JavaAgent features such as Code Hotspots and Adaptive Overload Protection, teams can automatically locate bottlenecks, mitigate traffic spikes, and improve stability without code changes.
Background
Load‑testing of a critical e‑commerce service showed the overall success rate dropping to 9.89% with high response times when traffic reached a threshold. CPU usage of the application pods approached saturation, indicating the service could not handle the high TPS.
Key Challenges
How to locate performance bottlenecks in a complex business system?
How to protect services against unpredictable traffic spikes?
Typical Open‑Source Approach
Collect tracing data with OpenTelemetry SDK/Agent.
Generate CPU flame‑graphs using Async Profiler.
Apply traffic governance with Sentinel.
These solutions require code instrumentation, additional infrastructure, and incur extra cost.
ARMS 3.x JavaAgent Features
The Alibaba Cloud JavaAgent 3.x provides two core capabilities without code changes:
Code Hotspots : automatically correlates tracing spans with CPU flame‑graphs, revealing missing instrumentation and high‑cost methods.
Adaptive Overload Protection : dynamically throttles traffic based on CPU usage thresholds, keeping the system stable during sudden load spikes.
Code Hotspots Example
Sample Java service that parses a JSON file and calls a downstream HTTP API:
public class HotSpotAction extends AbsAction {
private RestTemplate restTemplate = new RestTemplate();
@Override
public void runBusiness() {
readFile();
invokeAPI();
}
private void invokeAPI() {
String url = "https://httpbin.org/get";
String response = restTemplate.getForObject(url, String.class);
}
private double readFile() {
InputStreamReader reader = new InputStreamReader(
ClassLoader.getSystemResourceAsStream("data/xxx.json"));
LinkedList<Movie> movieList = GSON.fromJson(reader,
new TypeToken<LinkedList<Movie>>() {}.getType());
double totalCount = 0;
for (int i = 0; i < movieList.size(); i++) {
totalCount += movieList.get(i).rating();
}
return totalCount;
}
}Tracing shows a 2649 ms slow call chain with a 2 s blind spot between the first and second spans, which tracing alone cannot pinpoint.
After enabling the Code Hotspots feature, the UI displays a flame‑graph that highlights HotSpotAction.readFile() consuming 1.91 s , directly revealing the root cause.
Adaptive Overload Protection
In load‑test scenarios, CPU‑based auto‑scaling may still degrade success rates because new instances take seconds to become ready, causing overload on existing pods. The Microservice Engine (MSE) Adaptive Overload Protection monitors CPU usage and automatically applies a proportional rate‑limit when the threshold is reached, keeping latency low and success rates higher.
Test results show the overall success rate improving to 50.99% (≈80 % after excluding intentional rate‑limit failures) and response times quickly returning to normal once protection kicks in.
Integration with Alibaba Cloud Kubernetes (ACK)
The JavaAgent can be injected automatically via Pilot mode without modifying container images. Add the following labels to the pod spec:
armsPilotAutoEnable: "on"
armsPilotCreateAppName: "${ARMS_APP_NAME}"
msePilotAutoEnable: "on"
msePilotCreateAppName: "${MSE_APP_NAME}"
mseNamespace: "${MSE_NAMESPACE}"After deployment, the ARMS or MSE console can be accessed directly from the ACK UI.
Additional Features in JavaAgent 3.x
Bytecode enhancement for Java 8‑21, supporting a wide range of runtimes.
Short‑connection, compressed data reporting architecture; success rate raised to 99.99 %.
Support for Vert.x, Reactor‑Netty, OceanBase, xxl‑job, PostgreSQL, Kafka, and other popular components.
CPU overhead reduced by ~50 % and startup time ≤5 s.
Compatibility with OpenTracing in the upcoming 4.x release.
Conclusion
The ARMS 3.x JavaAgent delivers a non‑intrusive, cloud‑native solution for diagnosing performance bottlenecks and protecting services against traffic surges. By leveraging Code Hotspots and Adaptive Overload Protection, teams can achieve faster root‑cause analysis and maintain high availability without modifying application code.
References:
OpenTelemetry: https://opentelemetry.io/
Async Profiler: https://github.com/async-profiler/async-profiler
Sentinel: http://sentinelguard.io/zh-cn/docs/introduction.html
Java Agent overview: https://www.developer.com/design/what-is-java-agent/
Code Hotspots guide: https://help.aliyun.com/zh/arms/application-monitoring/user-guide/use-code-hotspots-to-diagnose-code-level-problems
Performance test report: https://help.aliyun.com/zh/arms/application-monitoring/developer-reference/performance-test-report-of-arms-agent-for-java
Supported Java components and frameworks: https://help.aliyun.com/zh/arms/application-monitoring/developer-reference/java-components-and-frameworks-supported-by-arms
MSE traffic protection configuration: https://help.aliyun.com/zh/mse/user-guide/configure-web-behavior-1
ARMS monitoring overview: https://help.aliyun.com/zh/arms/application-monitoring/getting-started/overview
MSE service governance access: https://help.aliyun.com/zh/mse/user-guide/application-access-3
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
