How HiClaw Automates Crash Alert Analysis with AI Agents in a Cloud‑Native Environment
This article details the design and workflow of HiClaw, an AI‑driven, cloud‑native system that intercepts DingTalk crash alerts, isolates analysis in secure containers, and automatically generates actionable reports, dramatically reducing manual investigation time while complying with strict internal security policies.
Background
Hourly crash alerts from the internal FBI (big‑data analysis and visualization platform) required manual investigation: opening reports, querying ODPS tables, inspecting backtraces, and writing conclusions. Security policies prohibited direct deployment of third‑party tools on production servers, so an isolated, automated solution was needed.
Key Components
FBI : Alibaba internal platform that stores crash data in ODPS tables.
fbi‑claw : Custom AI‑driven bot that orchestrates the analysis workflow.
Manager ("二营长") : AI orchestrator that receives DingTalk mentions and dispatches tasks.
Worker : Service that performs root‑cause mining via the bugman API.
OpenClaw / CoPow : Container images that package the analysis logic and its dependencies.
Workflow Overview
Single‑person group setup : Create a private DingTalk group that mirrors the main alert channel and add the manager bot.
Manual trigger : When an alert arrives, type @二营长 查一下 in the private group.
Automated analysis : The manager invokes fbi‑claw, which runs two agents: crash‑alert‑analysis extracts parameters, queries ODPS for the alert, selects the top‑3 crash versions and the top‑3 stack frames. bugman‑analysis performs source‑level root‑cause analysis. It supports a fast backtrace‑only mode and a deep semantic mode (enabled with deep_code=1) that uses tree‑sitter and clangd for repository‑wide code analysis.
Result delivery : The generated markdown report is posted back to the private group and forwarded via a DingTalk webhook to the main alert group.
Implementation Details
The system is built from reusable scripts:
odps_tools : Python wrapper around the ODPS SDK for fast SQL queries.
import odps
# Example query
result = odps.query("SELECT * FROM crash_table WHERE project='AXX'")crash‑alert‑analysis : Four‑step process – parameter extraction, alert data query, top‑3 version extraction, top‑3 stack‑frame extraction.
bugman‑analysis : Two execution modes.
# Fast analysis
bugman.analyze(backtrace)
# Deep analysis
bugman.analyze(backtrace, deep_code=1)ding‑sync‑tool : Sends the markdown report to DingTalk via a webhook.
Sample Report
【FBI预警处理结果】进展同步
项目:AXX
时间:2026-03-26 16:00
崩溃数:xx | 影响设备数:0
📊 **Top 版本**:x.xx.xx.x (xx次, 100%)
🔍 **Top 堆栈(xx次, 83%)**:
```
libc.so.6(+0x42520)
libGAdasSDK.so(+0x1f5cc0)
libGAdasSDK.so(+0x1a3585)
...
```
💡 **初步结论**:
- 设备数为0,崩溃集中在单一版本 X.XX.XX.X
- 主堆栈特征:xxxxxxBenefits
Compared with the manual process (opening a computer, querying tables, inspecting code, writing conclusions), the automated pipeline produces a complete analysis report in under two minutes. The workflow is fully traceable, reduces human fatigue, and ensures consistent output.
Evolution
The first prototype used a linear LangGraph workflow. It was later refactored to a multi‑round plan‑and‑execute + ReAct architecture, which enables iterative reasoning and deeper semantic analysis via tree‑sitter and clangd. This shift moves the system from a pure human‑in‑the‑loop model toward a human‑in‑the‑team model where the manager coordinates specialized workers.
References
HiClaw repository: https://github.com/agentscope-ai/HiClaw/blob/main/README.md
Team‑worker design proposal: https://github.com/agentscope-ai/HiClaw/blob/56a894315c8f87513d4bf8adb0a0c77a6a36b11c/docs/design/team-worker-proposal.md
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
