Why AI Agents Risk Losing Control and How AgentArmor Secures Them

The article examines the emerging security challenges of AI agents, outlines four fundamental vulnerabilities, and introduces the AgentArmor framework—featuring a graph constructor, property registry, and type system—to compile agent behavior into verifiable programs and dramatically reduce attack success rates.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
Why AI Agents Risk Losing Control and How AgentArmor Secures Them

Technical report: https://arxiv.org/abs/2508.01249

AI Agent Era Arrives, but Control Risks Loom

After large language models, AI agents are driving a new wave of automation, capable of understanding, planning, and executing real‑world tasks such as travel booking, cloud resource management, and email handling. However, recent high‑profile incidents reveal severe security flaws that can cause agents to act out of control.

Recent High‑Impact Vulnerabilities

Input side – over‑reliance on untrusted environments : agents ingest data from emails, forums, GitHub, etc., which attackers can poison to inject malicious commands.

Planning side – ambiguity of natural language : the inherent vagueness of language lets attackers hijack LLM reasoning and mislead agents.

Action side – excessive privileged access : agents need to read databases, credentials, and other assets, exposing sensitive information to theft or misuse.

Output side – uncontrolled external communication : agents can send data via email, comments, cloud storage, and if compromised, can exfiltrate or corrupt information.

Resulting Threats

Cross‑site injection hijacking

Financial fraud through unauthorized payments

Tool poisoning via malicious MCP descriptions

Why Traditional Defenses Fail

Content filtering, security scanning, static access control, and execution isolation treat agents like conventional software, ignoring their dynamic reasoning and autonomous actions, thus missing many unsafe behaviors.

AgentArmor: A New Paradigm

AgentArmor compiles an AI agent’s runtime behavior into a structured, verifiable program, enabling the application of mature software‑engineering analyses such as program‑dependency graphs and type checking.

AgentArmor treats the agent’s execution trace as an analyzable program.

Core Components

Graph Constructor : converts linear execution traces into a Program Dependency Graph capturing control and data flow.

Property Registry : enriches each graph node with security attributes, automatically assessing unknown tools and services.

Type System : derives security levels for nodes and enforces policies (e.g., escalation, de‑escalation, alerts, blocking) before risky actions occur.

Three Security Types

Trust type : establishes appropriate trust when agents interact with local, cloud, or third‑party services.

Safety type : robustly resists external attacks such as malicious command injection.

Rule type : guarantees faithful execution of user intents without unauthorized deviation.

Performance Highlights

Risk‑behavior detection rate near 100% with 93% attack‑success reduction.

Attack success drops from 28% to 4%, and to 0% for command‑coverage attacks.

Normal task completion remains virtually unchanged (73% → 72%).

Zero‑Trust Runtime Integration

AgentArmor intercepts untrusted behaviors, mirrors LLM call flows, and applies policy decisions to allow, block, or mitigate actions, achieving seamless protection without altering the agent’s functional architecture.

Future work includes open‑sourcing the core framework and extending it to AI coding, ChatBI agents, OS agents, and other verticals.

AgentArmor diagram
AgentArmor diagram
AI Agentzero trustAgentArmorProgram Dependency Graph
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.