How ICBC Built a Web‑Based Arthas Diagnostic Platform for Faster Java Issue Resolution

This article details ICBC's development of an online diagnostic platform using Arthas, covering the challenges of traditional Java debugging, the architectural design of the platform, gateway, and diagnostic process, as well as real‑world usage results and future outlook.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How ICBC Built a Web‑Based Arthas Diagnostic Platform for Faster Java Issue Resolution

ICBC continuously explores advanced information system construction to improve financial services and user experience. With widespread adoption of distributed architecture and cloud platforms, efficiently locating program errors or performance bottlenecks became a critical challenge for backend engineers.

Traditional Debugging Pain Points

Engineers often resort to log analysis, which is labor‑intensive and ineffective when logs are missing. Reproducing issues locally requires repeated logging, deployment, and testing, while performance problems demand additional instrumentation, leading to low efficiency and disrupted production environments.

Arthas as a Solution

Arthas leverages the JVM Attach mechanism to connect to target processes without affecting service continuity, and its bytecode enhancement framework enables dynamic class redefinition. Commands such as watch, jad, and redefine allow real‑time observation, decompilation, and hot‑updates, eliminating the repetitive log‑add‑deploy cycle.

Practical Difficulties

Information security : Direct command‑line access could expose sensitive customer data.

Learning curve : New developers need time to master Arthas commands.

Collaboration conflicts : Simultaneous operations on the same process can cause interference.

Resource consumption : Unclosed Arthas processes may consume memory and CPU.

Environment inconsistency : Different JVM implementations and network restrictions complicate deployment.

Technical Architecture

The solution consists of three components: an online diagnostic platform (Web UI), an online diagnostic gateway (RESTful API proxy), and the Arthas diagnostic process.

1. Online Diagnostic Platform

Provides one‑stop Web UI for complex interactions.

Handles installation/uninstallation, multi‑user collaboration with distributed locks, RBAC‑based user authentication, connection management via WebSocket, and operation auditing.

2. Online Diagnostic Gateway

Unifies access to cloud‑on and cloud‑off nodes, parses plain‑text Arthas output into JSON, and performs parameter validation, timeout management, data desensitization, and text processing.

3. Diagnostic Process

Manages installation, startup, version control, and network isolation for the Arthas agent on target servers.

Diagnostic Workflow

Preparation : Users input target server details (IP, process identifier, container ID if applicable).

Installation : The platform probes the target; if Arthas is absent or outdated, the gateway fetches the latest package and deploys it.

Startup : The installed package is unpacked and launched with the same JDK version as the target process; the target‑IP is set to the gateway address for isolation.

Usage & Uninstallation : Users issue diagnostic commands via the UI; after completion they can manually uninstall the Arthas process.

Real‑World Usage Effects

Control Panel : Uses the dashboard command to display JVM, thread, and OS metrics refreshed every 10 seconds.

Thread List : Shows threads sorted by CPU usage with filtering options; clicking a thread opens detailed stack information.

Method Monitoring : Supports observation ( watch), tracing ( trace), back‑trace, monitoring, and decompilation ( jad) directly from the UI, with configurable thresholds and output.

Decompilation : Displays the actual bytecode of loaded classes, including class loader hierarchy and source paths.

Review and Outlook

Learning cost reduced through Web UI and structured JSON APIs.

Multi‑user conflicts resolved via session management, mutual exclusion, and automatic cleanup.

Information security ensured by RBAC, data desensitization, and network isolation.

Resource usage controlled by automatic shutdown and custom class loader recycling.

Environment heterogeneity addressed by a gateway cluster that standardizes access and media distribution.

The platform continues to play a key role in live issue analysis, and newer Arthas versions now provide native RESTful interfaces that align with ICBC's design. Future work will focus on deeper community collaboration and further enhancements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Arthasbackend debuggingonline platform
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.