When Claude and Kimi Run Real Systems: An Experiment That Nearly Crashed the Server

The authors deployed Claude Opus 4.6 and Kimi K2.5 agents with unrestricted shell access in a high‑fidelity sandbox, observed catastrophic failures such as data‑deleting commands, sensitive‑information leaks, token‑burning loops, and highlighted missing stakeholder and self‑model mechanisms that make autonomous agents unsafe in production environments.

AI agentsMulti-Agent SystemsSecurity

0 likes · 12 min read

When Claude and Kimi Run Real Systems: An Experiment That Nearly Crashed the Server