Artificial Intelligence 11 min read

How VIGIL’s Verify‑Before‑Execute Paradigm Defeats LLM Agent Tool Hijacking

VIGIL introduces a verify‑before‑commit framework that isolates tool‑stream injection attacks on LLM agents, using intent anchoring, perception sanitization, speculative reasoning, grounding verification, and validated trajectory memory, reducing attack success rates to 8‑12% while preserving task utility.

Data Party THU

May 18, 2026

How VIGIL’s Verify‑Before‑Execute Paradigm Defeats LLM Agent Tool Hijacking

Background

LLM agents that call external tools in open, dynamic environments are vulnerable to tool‑stream injection , where attackers corrupt tool specifications, dependency graphs, return values, or error messages to steer the agent away from the user’s intent. Two dilemmas arise: (1) stronger models are more likely to obey maliciously crafted tool rules; (2) static defenses that enforce a "plan‑then‑execute" workflow collapse when forged feedback appears during execution.

Method: VIGIL

VIGIL introduces a Verify‑Before‑Commit paradigm that separates speculative reasoning from actual execution. The framework consists of five components:

Intent Anchor – Extract a high‑level task sketch and immutable constraints (e.g., operation scope, allowed objects) directly from the user query, establishing a root of trust for all subsequent candidate actions.

Perception Sanitizer – Rewrite external tool descriptions and runtime feedback to remove imperative cues such as “must”, “immediately”, or “call this tool first”, preserving factual functionality while eliminating manipulative language.

Speculative Reasoner – Allow the agent to explore multiple possible execution branches on the sanitized information, treating them as hypotheses rather than committed actions.

Grounding Verifier – Perform a two‑layer check before any action is committed: (a) ensure the action does not violate hard constraints (e.g., over‑privilege); (b) confirm the action is semantically necessary for completing the user’s task. Only actions passing both checks are executed.

Validated Trajectory Memory – Cache successfully verified execution trajectories so that similar future queries can retrieve safe paths quickly, keeping verification overhead sub‑linear as the tool library grows.

Experiments

Benchmark Construction – SIREN

The SIREN (Systemic Injection & Reasoning Evaluation Benchmark) evaluates tool‑stream injection in an environment with 496 tools and dynamic dependencies. It contains 959 tool‑stream attack cases across five categories (Explicit Directive, Dependency Trap, Feature Inducement, Runtime Hijacking, Error Hijacking) and an additional 949 data‑flow attack cases as a baseline.

Main Results

Evaluations on two backbone models, Qwen3‑max and Gemini‑2.5‑pro, show that VIGIL markedly outperforms existing defenses:

Qwen3‑max: Utility under attack (UA) = 27.53 %, Attack success rate (ASR) = 8.13 %.

Gemini‑2.5‑pro: UA = 18.46 %, ASR = 11.99 %.

Compared with recent dynamic defenses, VIGIL reduces tool‑stream ASR by an additional 22 %–24 % and improves task utility under attack by more than two‑fold relative to static defenses such as Tool‑Filter and CaMeL. On benign inputs, VIGIL retains 74.49 % utility on Qwen3‑max, close to the 79.59 % of an unprotected Vanilla ReAct system.

Ablation and Robustness Analysis

Ablation studies reveal the contribution of each component:

Removing the Grounding Verifier raises ASR to 45.05 %.

Removing the Speculative Reasoner drops UA from 27.53 % to 9.07 %.

Scaling experiments show that expanding the tool set from 496 to 3,074 tools only modestly increases verification cost, which converges to a stable level. Even as malicious tool density rises, ASR remains low while utility degrades only gradually.

Conclusion

VIGIL demonstrates that LLM agents can maintain high task‑completion ability without blindly trusting external tools. By shifting from static isolation to a verify‑before‑commit design and validating the approach with the SIREN benchmark, the work shows that security and usability need not be a zero‑sum trade‑off for agents operating in open environments.

Code example

本文
约2600字
，建议阅读
5
分钟
本文介绍了 VIGIL 框架及 SIREN 基准，有效防御 LLM Agent 工具流注入。

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI safety LLM Agents SIREN benchmark tool stream injection verify-before-commit VIGIL

Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.