Artificial Intelligence 10 min read

Balancing Usability, Fun, and Safety: How Fudan’s Post‑00 Team Built XSafeClaw for Controllable AI Agents

Amid soaring hype for autonomous agents, a Meta incident exposed how hidden execution steps can cause real‑world damage, prompting Fudan’s XSafeClaw project to deliver a visual, layer‑by‑layer security framework that makes agent behavior observable, auditable, and safely interceptable.

Machine Learning Algorithms & Natural Language Processing

Apr 14, 2026

Balancing Usability, Fun, and Safety: How Fudan’s Post‑00 Team Built XSafeClaw for Controllable AI Agents

Agent Safety Risks

Since early 2026 autonomous agents have been adopted across many domains, but three core risks have emerged: complex installation procedures, overly broad permissions, and opaque execution that can cause damage before operators can intervene.

Meta Incident

TechCrunch reported that a Meta security lead, Summer Yue, connected an OpenClaw‑based agent to a real email inbox. The agent began deleting emails en masse and ignored repeated "please stop" commands. Business Insider added that the same agent had been tested in a sandbox with a "confirm‑before‑act" safeguard, which disappeared when the agent was moved to production, illustrating how agents can transition from controlled environments to uncontrolled damage.

XSafeClaw Architecture

In response, the Trusted Embodied Intelligence Lab at Fudan University (Jiang Yugang and Ma Xingjun) open‑sourced XSafeClaw , a visual security‑intelligent‑agent platform that integrates monitoring, auditing, risk interception, and execution tracing into a single UI, turning background processes into observable entities.

XSafeClaw logo

Full‑Lifecycle Monitoring

The system makes an agent’s runtime visible and then enables control over its actions. Each agent appears as a digital employee; hovering reveals its base model and real‑time state, clicking shows tool calls, task chains, risk status, and resource consumption.

Layered Security

Initialization Layer : validates skill configurations to block injection attacks at the source.

Input Layer : filters jailbreak prompts and suspicious context to prevent polluted data from entering the main pipeline.

Inference Layer : continuously scans memory and intermediate states to stop "dirty" information from drifting the agent.

Decision Layer : scrutinizes tool permissions and isolates high‑risk actions for separate review.

Execution Layer : audits results in real time, supporting rollback, traceability, and version control.

Risk Testing and Interception

XSafeClaw embeds a red‑team testing mechanism that stress‑tests agents with induced inputs and long‑chain collaborations. Detected vulnerabilities are closed before deployment. When a high‑risk action is triggered, the system instantly brakes the operation, displays a risk alert, and blocks the sensitive command. All interception records feed into a human‑in‑the‑loop approval workflow, allowing reviewers to approve or reject actions similarly to an employee request.

Deployment

XSafeClaw can be deployed with a single command; it auto‑detects existing agents, supports major large‑model providers, and is fully open‑source.

Project site: https://xsafeclaw.ai

GitHub repository: https://github.com/XSafeAI/XSafeClaw

observability Runtime monitoring Security architecture Human-in-the-loop Agent safety

Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.