Operations 13 min read

AI-Integrated Arthas: Turning Online Issue Diagnosis into One-Click Debugging

The article explains how Arthas, now integrated with the Model Context Protocol (MCP), lets AI automatically execute and interpret diagnostic commands for Java applications, offering step‑by‑step case studies, a detailed workflow, and an analysis of its strengths and limitations.

SpringMeng
SpringMeng
SpringMeng
AI-Integrated Arthas: Turning Online Issue Diagnosis into One-Click Debugging

Java production debugging with Arthas

Arthas is a Java instrumentation tool that injects temporary bytecode via the Java Instrumentation API and ASM, allowing inspection of threads, method tracing, parameter monitoring, and class‑loading information without code changes or restarts. Example: executing trace com.example.Service method inserts bytecode at method entry and exit to measure execution time; the instrumentation is removed when Arthas exits.

The main difficulty is selecting the correct command, constructing OGNL expressions, and filtering noisy output, making decision‑making the bottleneck.

Model Context Protocol (MCP)

MCP is an open‑source JSON‑RPC 2.0 protocol introduced by Anthropic in November 2024 to unify AI‑assistant integration with tools. It defines three components: an MCP Host (the AI application, e.g., Claude Desktop or Cursor), an MCP Client (handles communication), and an MCP Server (exposes diagnostic capabilities). Communication can use stdio or HTTP. By early 2026 more than 10 000 active MCP Servers were in production, with monthly SDK downloads reaching 97 million.

Arthas MCP Server

The Arthas MCP Server is an experimental module that wraps 26 core Arthas commands behind a JSON‑RPC 2.0 HTTP/Netty interface, making them callable by any MCP‑compatible AI client. Documentation: https://arthas.aliyun.com/doc/mcp-server.html

The tools are grouped into three categories:

JVM related (12 tools) : dashboard, heapdump, jvm, memory, thread, sysprop, sysenv, vmoption, perfcounter, vmtool, getstatic, ognl

Class‑loading related (8 tools) : sc, sm, jad, classloader, mc, redefine, retransform, dump

Monitoring & diagnosis (6 tools) : monitor, stack, trace, watch, tt, profiler

Two protocol modes are supported:

STREAMABLE : SSE‑based streaming response for real‑time monitoring.

STATELESS : Traditional request‑response for simple integration.

Practical case study – CPU spike

Scenario: An order service shows 98 % CPU usage during peak hours, causing slow responses.

Using an MCP‑enabled AI client, the operator asks: “My order service CPU jumped to 98 %, help me find which thread is causing the high CPU and where it is stuck.”

The AI follows a built‑in troubleshooting script:

Step 1: Call dashboard to obtain an overview.

ID   NAME                     CPU%   STATE
29   http-nio-8080-exec-8      89.2   RUNNABLE
12   DubboServerHandler-...    3.1    RUNNABLE
...

AI identifies thread ID 29 (http‑nio‑8080‑exec‑8) consuming 89.2 % CPU.

Step 2: Call thread for ID 29.

"http-nio-8080-exec-8" #29 prio=5 os_prio=0 tid=0x00007f8e2c001800 nid=0x4f runnable
    java.lang.Thread.State: RUNNABLE
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4660)
    ...
    at com.example.logging.LogAspect.logAround(LogAspect.java:47)
    at com.example.order.service.OrderService.getOrder(OrderService.java:123)

Step 3: Analyze the stack. The thread is stuck in java.util.regex.Pattern inside a logging aspect.

Problem originates from LogAspect.java:47 where a greedy .* pattern causes catastrophic backtracking on large JSON payloads. Root cause: the logging aspect’s desensitization regex is too broad, leading to CPU exhaustion.

Step 4: Suggested remediation (temporary and permanent):

Disable the logging aspect via configuration.

Replace the greedy regex with a non‑greedy pattern (e.g., [^"]*) or use String.indexOf.

Add a length limit (e.g., skip payloads > 2000 characters) to the aspect.

Without AI, the operator would manually execute the same sequence of nine steps (login, start Arthas, run dashboard, note thread ID, run thread 29, parse stack, locate code, modify, redeploy), each prone to error.

Other commands usable through MCP include trace for slow interfaces, heapdump for memory leaks, thread -b for deadlocks, and watch for parameter anomalies.

Advantages and disadvantages

No need to memorize Arthas command syntax; natural language suffices.

Unified integration with major AI clients via MCP.

Secure access through Bear Token authentication.

Includes 26 core diagnostic tools covering full‑stack JVM analysis.

HTTP‑based, easy to embed in IDEs or monitoring systems.

Experimental module; features evolve rapidly.

Requires manual configuration of MCP client connections.

Complex scenarios may still need human verification of AI inferences.

Applicable scenarios

Rapid daily fault triage.

Newcomer onboarding.

IDE‑assisted code‑level debugging.

Building internal AI‑ops assistants.

Conclusion

Arthas integration with MCP enables AI‑assisted online troubleshooting: a natural‑language description triggers the AI to invoke Arthas commands, analyze results, and produce a diagnostic report, effectively providing on‑demand expert-level analysis for CPU spikes, slow interfaces, memory leaks, and deadlocks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

javaMCPArthasAI debuggingdiagnostic toolsonline troubleshooting
SpringMeng
Written by

SpringMeng

Focused on software development, sharing source code and tutorials for various systems.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.