How ICBC Built a Web‑Based Arthas Diagnostic Platform for Faster Java Issue Resolution
This article details ICBC's development of an online diagnostic platform using Arthas, covering the challenges of traditional Java debugging, the architectural design of the platform, gateway, and diagnostic process, as well as real‑world usage results and future outlook.
ICBC continuously explores advanced information system construction to improve financial services and user experience. With widespread adoption of distributed architecture and cloud platforms, efficiently locating program errors or performance bottlenecks became a critical challenge for backend engineers.
Traditional Debugging Pain Points
Engineers often resort to log analysis, which is labor‑intensive and ineffective when logs are missing. Reproducing issues locally requires repeated logging, deployment, and testing, while performance problems demand additional instrumentation, leading to low efficiency and disrupted production environments.
Arthas as a Solution
Arthas leverages the JVM Attach mechanism to connect to target processes without affecting service continuity, and its bytecode enhancement framework enables dynamic class redefinition. Commands such as watch, jad, and redefine allow real‑time observation, decompilation, and hot‑updates, eliminating the repetitive log‑add‑deploy cycle.
Practical Difficulties
Information security : Direct command‑line access could expose sensitive customer data.
Learning curve : New developers need time to master Arthas commands.
Collaboration conflicts : Simultaneous operations on the same process can cause interference.
Resource consumption : Unclosed Arthas processes may consume memory and CPU.
Environment inconsistency : Different JVM implementations and network restrictions complicate deployment.
Technical Architecture
The solution consists of three components: an online diagnostic platform (Web UI), an online diagnostic gateway (RESTful API proxy), and the Arthas diagnostic process.
1. Online Diagnostic Platform
Provides one‑stop Web UI for complex interactions.
Handles installation/uninstallation, multi‑user collaboration with distributed locks, RBAC‑based user authentication, connection management via WebSocket, and operation auditing.
2. Online Diagnostic Gateway
Unifies access to cloud‑on and cloud‑off nodes, parses plain‑text Arthas output into JSON, and performs parameter validation, timeout management, data desensitization, and text processing.
3. Diagnostic Process
Manages installation, startup, version control, and network isolation for the Arthas agent on target servers.
Diagnostic Workflow
Preparation : Users input target server details (IP, process identifier, container ID if applicable).
Installation : The platform probes the target; if Arthas is absent or outdated, the gateway fetches the latest package and deploys it.
Startup : The installed package is unpacked and launched with the same JDK version as the target process; the target‑IP is set to the gateway address for isolation.
Usage & Uninstallation : Users issue diagnostic commands via the UI; after completion they can manually uninstall the Arthas process.
Real‑World Usage Effects
Control Panel : Uses the dashboard command to display JVM, thread, and OS metrics refreshed every 10 seconds.
Thread List : Shows threads sorted by CPU usage with filtering options; clicking a thread opens detailed stack information.
Method Monitoring : Supports observation ( watch), tracing ( trace), back‑trace, monitoring, and decompilation ( jad) directly from the UI, with configurable thresholds and output.
Decompilation : Displays the actual bytecode of loaded classes, including class loader hierarchy and source paths.
Review and Outlook
Learning cost reduced through Web UI and structured JSON APIs.
Multi‑user conflicts resolved via session management, mutual exclusion, and automatic cleanup.
Information security ensured by RBAC, data desensitization, and network isolation.
Resource usage controlled by automatic shutdown and custom class loader recycling.
Environment heterogeneity addressed by a gateway cluster that standardizes access and media distribution.
The platform continues to play a key role in live issue analysis, and newer Arthas versions now provide native RESTful interfaces that align with ICBC's design. Future work will focus on deeper community collaboration and further enhancements.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
