Operations 21 min read

Why Alibaba Cloud’s New Java Agent Outperforms OpenTelemetry in Performance and Features

This article examines the evolution from ARMS Java Agent to the OTel‑based Alibaba Cloud Java Agent 4.x, comparing tracing, metrics, logging, and profiling capabilities, highlighting innovative designs such as muzzle‑check and VirtualField, and detailing the performance, stability, and community contributions that make the new agent a superior observability solution.

Alibaba Cloud Observability
Alibaba Cloud Observability
Alibaba Cloud Observability
Why Alibaba Cloud’s New Java Agent Outperforms OpenTelemetry in Performance and Features

Background

In February 2018, the first version of ARMS Java Agent was released, providing non‑intrusive observability data collection. Six years later, rapid software evolution, richer business scenarios, and growing user scale exposed limitations in the original architecture, prompting a reconsideration of a major redesign.

About OTel Java Agent

OpenTelemetry (OTel), a CNCF project, quickly became the de‑facto open‑source standard for telemetry collection, processing, and export. Its community creates and maintains APIs, SDKs, and tools for distributed tracing, metrics, and logs, aiming to make observability easier to adopt in cloud‑native software development.

Feature Comparison

A side‑by‑side comparison of ARMS (3.1.4) and OTel Java Agent (1.28.0) shows that OTel offers more plugins (≈128 vs ≈60), broader async support, full OTel SDK compatibility, and richer tracing capabilities, while ARMS excels in sampling, multi‑protocol support, metric richness, and third‑party integrations.

Key Design Highlights

muzzle‑check mechanism : At compile time, the agent records which methods and fields of target classes are accessed; at runtime, it skips enhancement if the class lacks the required members, preventing errors.

VirtualField mechanism : Allows adding virtual fields to classes without modifying bytecode, using a global ConcurrentWeakHashMap when the class is already loaded, enabling seamless context propagation.

Async context propagation : Implements Runnable wrappers that store trace context via VirtualField, and instruments Executor execute methods to transfer the context, supporting frameworks like Akka and Netty.

New instrumentation ideas : Leverages framework extension points (e.g., Dubbo filters, gRPC interceptors, Lettuce tracing interfaces) instead of direct method enhancement.

public static <U extends T, V extends F, T, F> VirtualField<U, V> find(Class<T> type, Class<F> fieldType) {<br/>    return RuntimeVirtualFieldSupplier.get().find(type, fieldType);<br/>}

Enhancements Made

New plugin support : Added instrumentation for popular Chinese frameworks and middleware such as Druid, XXL‑Job, HSF, InfluxDB, MyBatis, Motan, ShenYu, etc., with several contributions upstream.

Tracing enhancements : Multi‑protocol support (automatic detection of EagleEye, W3C, Zipkin, SkyWalking, Jaeger), call‑chain compression to reduce span explosion, and a suite of sampling strategies (fixed‑ratio, adaptive LFU‑based, low‑traffic, error‑slow sampling, custom sampling).

Metrics enhancements : Expanded thread‑pool, thread, MQ latency, DB request/response size, and exception metrics, providing richer RED indicators and detailed dimensions.

Profiling support : Integrated continuous profiling (CPU, memory, wall‑clock) via async‑profiler, offering on‑off‑CPU flame graphs linked to trace IDs.

Performance optimizations : Reduced CPU overhead by ~2% and memory usage by ~10 MB in high‑TPS tests through attribute copy and sorting optimizations.

Stability features : CPU/memory usage caps with automatic degradation, pre‑flight checks, and dynamic feature toggles to avoid impacting user applications.

Cloud product integration : Embedded MSE microservice governance, cloud security RASP, and other Alibaba Cloud services.

Benefits of Alibaba Cloud Java Agent 4.0

JDBC‑compliant database instrumentation covering all JDBC‑compatible databases.

Zero‑configuration async instrumentation that prevents trace breaks.

Improved plugin accuracy and broader version support for Vert.x, WebFlux, Lettuce, RabbitMQ, Kafka, RocketMQ, ONS, etc.

Container‑aware system metric collection.

Custom thread‑pool monitoring.

Reduced memory footprint (‑20%), thread count (‑60%), and agent size (‑30%).

Conclusion

The migration to an OTel‑based architecture allowed Alibaba Cloud to adopt best‑in‑class designs, enhance tracing, metrics, and profiling, improve performance and stability, and contribute back to the open‑source community with numerous PRs, conference talks, and regional community initiatives.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Observabilitytracing
Alibaba Cloud Observability
Written by

Alibaba Cloud Observability

Driving continuous progress in observability technology!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.