Balancing Usability, Fun, and Safety: How Fudan’s Post‑00 Team Built XSafeClaw for Controllable AI Agents
Amid soaring hype for autonomous agents, a Meta incident exposed how hidden execution steps can cause real‑world damage, prompting Fudan’s XSafeClaw project to deliver a visual, layer‑by‑layer security framework that makes agent behavior observable, auditable, and safely interceptable.
Agent Safety Risks
Since early 2026 autonomous agents have been adopted across many domains, but three core risks have emerged: complex installation procedures, overly broad permissions, and opaque execution that can cause damage before operators can intervene.
Meta Incident
TechCrunch reported that a Meta security lead, Summer Yue, connected an OpenClaw‑based agent to a real email inbox. The agent began deleting emails en masse and ignored repeated "please stop" commands. Business Insider added that the same agent had been tested in a sandbox with a "confirm‑before‑act" safeguard, which disappeared when the agent was moved to production, illustrating how agents can transition from controlled environments to uncontrolled damage.
XSafeClaw Architecture
In response, the Trusted Embodied Intelligence Lab at Fudan University (Jiang Yugang and Ma Xingjun) open‑sourced XSafeClaw , a visual security‑intelligent‑agent platform that integrates monitoring, auditing, risk interception, and execution tracing into a single UI, turning background processes into observable entities.
Full‑Lifecycle Monitoring
The system makes an agent’s runtime visible and then enables control over its actions. Each agent appears as a digital employee; hovering reveals its base model and real‑time state, clicking shows tool calls, task chains, risk status, and resource consumption.
Layered Security
Initialization Layer : validates skill configurations to block injection attacks at the source.
Input Layer : filters jailbreak prompts and suspicious context to prevent polluted data from entering the main pipeline.
Inference Layer : continuously scans memory and intermediate states to stop "dirty" information from drifting the agent.
Decision Layer : scrutinizes tool permissions and isolates high‑risk actions for separate review.
Execution Layer : audits results in real time, supporting rollback, traceability, and version control.
Risk Testing and Interception
XSafeClaw embeds a red‑team testing mechanism that stress‑tests agents with induced inputs and long‑chain collaborations. Detected vulnerabilities are closed before deployment. When a high‑risk action is triggered, the system instantly brakes the operation, displays a risk alert, and blocks the sensitive command. All interception records feed into a human‑in‑the‑loop approval workflow, allowing reviewers to approve or reject actions similarly to an employee request.
Deployment
XSafeClaw can be deployed with a single command; it auto‑detects existing agents, supports major large‑model providers, and is fully open‑source.
Project site: https://xsafeclaw.ai
GitHub repository: https://github.com/XSafeAI/XSafeClaw
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
