How AI‑Powered /log‑diagnosis Skill Automates Bug Tracing in Backend Logs

This article explains how the /log‑diagnosis Skill, built on Claude Code and the Model Context Protocol (MCP) log platform, creates a closed‑loop workflow that automatically fetches logs, extracts key information, maps it to source code, diagnoses root causes, and generates remediation suggestions, dramatically speeding up backend debugging.

DeWu Technology
DeWu Technology
DeWu Technology
How AI‑Powered /log‑diagnosis Skill Automates Bug Tracing in Backend Logs

Overview

Backend developers often waste time switching between a log platform and an IDE to locate bugs. The article proposes an AI‑driven solution that combines Claude Code’s Skill mechanism with the Model Context Protocol (MCP) log service to automate the entire "search logs → extract key info → scan code → pinpoint issue" loop.

What is the MCP Log Platform?

MCP (Model Context Protocol) standardizes tool‑calling for the log platform. Claude Code communicates with MCP via a Server‑Sent Events (SSE) long‑running connection, receiving logs in real time. The protocol defines tools such as secretKey acquisition, acquireTokenTool, and various log query utilities.

secretKey(日志平台后管申请)
    ↓ acquireTokenTool
accessToken(1小时有效,最多同时存在5个)
    ↓ 携带 accessToken
logsQuery / logSqlQuery / countLogTool ...

/log‑diagnosis Skill

The Skill is a custom command defined in .claude/skills/. When a user runs /log-diagnosis {env} {branch} {request}, Claude loads the Skill definition, reads .diagnosis/config.json, refreshes the access token if needed, calculates the log time window from the trace ID, pulls all relevant logs via MCP (up to 20 pages), switches to the specified code branch, searches the codebase, and finally produces a diagnostic report.

用户输入 /log-diagnosis {环境} {代码分支} {诉求}
    ↓
Claude 加载 .claude/skills/log-diagnosis/SKILL.md
    ↓
读取 .diagnosis/config.json 获取当前环境配置
    ↓
检查 accessToken 是否过期,过期则自动刷新
    ↓
从 traceId 计算日志时间范围(取第9-16位16进制时间戳)
    ↓
调用日志平台 MCP 分页拉取全量日志(最多20页,不遗漏)
    ↓
切换到指定代码分支,结合日志关键词检索代码
    ↓
综合分析:上游日志 + 当前服务日志 + 代码逻辑 → 根因
    ↓
生成诊断报告(飞书文档 or 本地 Markdown)
    ↓
恢复原始代码分支

Two Entry Points

Provide a traceId to locate a specific request.

Provide an alert description (e.g., business error code) for a broader investigation.

Core Capabilities

Token auto‑management: accessToken is refreshed automatically.

Full pagination: All log pages are retrieved (up to 20) to avoid missing data.

Cross‑service analysis: The Skill discovers upstream/downstream services and pulls their logs.

Code linkage: Class and method names extracted from logs are used to locate exact code locations.

queryString Syntax

# Format
{field} {operator} "{value}" {connector} {field} {operator} "{value}"
# Operators
=  : exact match
≈  : fuzzy match (like)
# Connectors
AND / OR / NOT
# Example
trace_id = "a1b2c3d4e5f6789012345678abcdef01"
trace_id = "xxx" AND log_level = "ERROR"
endpoint ≈ "/api/your-endpoint" AND log_level = "ERROR"
message ≈ "timeout"

Installation & Configuration

Install MCP for each environment (test, pre‑prod, prod) using the Claude CLI:

# Test environment
claude mcp add --transport sse dw-log-mcp-t1 https://{your-t1-aigw-domain}/api/v1/mcp/log-mcp/sse
# Pre‑prod environment
claude mcp add --transport sse dw-log-mcp-pre https://{your-pre-aigw-domain}/api/v1/mcp/log-mcp/sse
# Production environment
claude mcp add --transport sse dw-log-mcp-prd https://{your-prd-aigw-domain}/api/v1/mcp/log-mcp/sse

After adding the servers, restart Claude Code and run /mcp to verify the connection.

Install the /log‑diagnosis Skill by placing the following directory structure into the project:

your-project/
└── .claude/
    └── skills/
        └── log-diagnosis/
            ├── SKILL.md   # behavior definition
            ├── README.md  # usage guide
            └── reference.md  # time scripts, queryString examples

Configure .diagnosis/config.json . The only manual field is secretKey, obtained from the log platform’s backend. The other fields ( accessToken, accessTokenExpireAt, fields) are populated automatically.

Usage

Command format:

/log-diagnosis {env} {branch (optional)} {request description}

Examples:

# Locate an interface issue by traceId
/log-diagnosis T1 feature/your-branch trace_id: "your-trace" 为什么最终没有返回数据
# Analyze an alert message
/log-diagnosis PRD master 告警详情:【接口:YourService/yourMethod】【业务码:10002000】帮我分析问题可能性

The AI completes the whole process in minutes and returns a root‑cause analysis.

Real‑World Case: A Hidden SQL Bug

Background: An API in the test environment returned no data. The engineer invoked the Skill with the trace ID.

/log-diagnosis T1 feature/your-branch trace_id: "your-trace" 为什么最终没有返回数据

AI actions:

Derived the log time window from the trace ID (full day).

Refreshed the access token.

Fetched 73 log entries across 2 pages.

Log‑to‑code reconstruction: The AI built a toSearchDTO object from the logs and identified that the SQL query returned an empty result set.

{
  "channelType": "MANUAL",
  "customerTag": 1,
  "deliveryMode": "某配送方式",
  "orderStatus": "8010",
  "orderType": "0",
  "productCategoryIds": [29],
  "status": 1,
  "ticketSource": 67,
  "ticketTypeId": 5802
}

SQL analysis: The generated SQL omitted the empty‑string check for customer_tag, causing all rows to be filtered out.

AND (a.customer_tag IS NULL OR a.customer_tag = 1)   <-- BUG (missing = '')

Fix suggested by AI:

<!-- 修复后 -->
<if test="customerTag != null">
    and (a.customer_tag IS NULL OR a.customer_tag = '' OR a.customer_tag = #{customerTag})
</if>

The bug was subtle because the field stored empty strings, which the original condition failed to match.

Efficiency Comparison

The article shows a performance chart (omitted here) demonstrating that the AI‑driven approach reduces diagnosis time from several minutes of manual work to under a minute.

Key Points for Diagnosis Efficiency

Prefer trace‑ID based log retrieval for precise request tracing.

Focus on critical log nodes such as toSearchDTO finished, search begins, resultList is empty, and search finished to quickly locate data loss.

SQL logs from the ORM are golden clues; AI excels at spotting inconsistencies across similar fields.

Always paginate through all log pages; partial results can hide the root cause.

Conclusion

The combination of MCP (providing dynamic data access) and a well‑defined Skill (encapsulating the SOP) enables AI to take over repetitive debugging tasks. This pattern can be extended to code review, performance analysis, alert triage, and other fixed‑process engineering activities, turning expert knowledge into reusable AI capabilities.

MCPlog analysisAI debuggingClaude CodeSkillbackend troubleshooting
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.