Why Large‑Model AI Agents Need Strict Security Controls

The article compares AWS Rex, which enforces Cedar policies on Rhai scripts, with Vercel deepsec, which lets powerful coding agents hunt vulnerabilities, showing how both defensive and offensive approaches are shaping the emerging security model for AI agents in production.

Old Zhang's AI Learning
Old Zhang's AI Learning
Old Zhang's AI Learning
Why Large‑Model AI Agents Need Strict Security Controls

1. AWS Rex: Policy‑bound AI scripts

Rex tackles the classic DevOps problem where a script inherits full host permissions, making the "Agent era" fragile. By requiring a Cedar policy check before every operation, Rex ensures that generated scripts cannot exceed explicitly allowed actions.

When an Agent generates and executes a script without human review, code‑review, approval flows, and whitelists become ineffective.

The workflow is simple: the script declares its intent, the policy decides what is permitted, and each operation is checked before execution.

Rhai script ──► Rex SDK operation (read/write/open…)──► Cedar policy check
                                 │
                         ┌─────► execute system call
                         └─────► ACCESS_DENIED_EXCEPTION

Technical choices

Script language : Rhai – a lightweight embedded language with zero built‑in system access.

Policy engine : Cedar – AWS’s own policy language already used in IAM/Verified Permissions.

Runtime : rex‑runner (Rust) – the only component that can touch the host.

Rhai’s lack of read/write/exec primitives forces every host interaction to go through the Rex SDK, which checks a Cedar policy first. This cleanly decouples script logic from permissions; changing the policy changes the allowed actions without modifying the script.

What happens when an Agent hits a wall?

If an Agent, due to hallucination or prompt injection, generates a script that exceeds the policy, it receives a clear ACCESS_DENIED_EXCEPTION instead of an unexpected side‑effect. The Agent can observe the error, reason about it, and adjust.

Quick demo

Install the runner: cargo install rex-runner Create a policy that only permits open and read:

permit(
    principal,
    action in [
        file_system::Action::"open",
        file_system::Action::"read",
        // uncomment to allow write:
        //file_system::Action::"create",
        //file_system::Action::"write",
    ],
    resource
);

Write a script that tries to create then read a file:

write("/tmp/hello.txt", "Hello from Rex!");
cat("/tmp/hello.txt");

Run it:

rex-runner \
  --script-file script.rhai \
  --policy-file policy.cedar \
  --output-format human

The runner reports:

error: Permission denied:
  file_system::Action::"create" on /tmp/hello.txt

Uncomment create and write in the policy, rerun, and the script prints “Hello from Rex!”. The script stays unchanged while the policy drives the behavior.

The documentation mentions integration with IAM and SSM, so enterprises can plug the policy into existing AWS permission systems.

2. Vercel deepsec: Agents as vulnerability hunters

deepsec positions itself as an “agent‑powered vulnerability scanner”, letting large‑model coding agents (Claude, Codex) explore a codebase autonomously.

Configuration uses Opus 4.7 (max effort) + GPT‑5.5 xhigh reasoning; scanning a large repo can cost thousands of dollars.

Customers report that deepsec finds more true positives than traditional SAST tools, with a false‑positive rate of 10–20 % after a second “revalidation” agent filters findings.

Five‑step pipeline

Scan : regex‑based sweep of the whole repository to locate security‑sensitive files.

Investigate : the primary agent examines each candidate, tracing data flow, checking mitigations, and assigning severity.

Revalidate : a second agent cross‑checks the findings to reduce false positives.

Enrich : Git metadata is used to identify the responsible developer.

Export : results are emitted in a format consumable by both humans and downstream agents.

The revalidation step is highlighted as crucial because a single agent’s signal‑to‑noise ratio is low; the second agent brings the false‑positive rate down to the quoted 10–20 %.

Running deepsec locally involves:

# In the repository root
npx deepsec init
cd .deepsec
pnpm install

Then the user is asked to feed a coding agent a prompt that reads the tool’s SKILL.md and SETUP.md, then scans representative files, keeping the output to 50–100 lines.

pnpm deepsec scan
pnpm deepsec process
pnpm deepsec revalidate   # optional, reduces false positives
pnpm deepsec export --format md-dir --out ./findings

For large codebases, deepsec can distribute work across Vercel Sandbox sandboxes, supporting 1000+ concurrent sandboxes to compress days of work into hours.

Contrary to the belief that security tasks need “cyber‑fine‑tuned” models, the author observed that the standard Opus 4.7 and GPT‑5.5 models are sufficient; deepsec includes a classifier that detects refusals and retries automatically, meaning ordinary subscriptions can run the tool without special approvals.

3. Putting the two together: Agent × Security

Both projects address the same fundamental question: when AI agents write code, run scripts, or modify systems in production, how should the security model evolve?

Role : Rex – defensive; deepsec – offensive.

Trust assumption : Rex assumes agents may err or be injected; deepsec assumes agents can discover more vulnerabilities than static rules.

Implementation : Rex uses Rust runtime + Cedar policies; deepsec uses coding agents + a five‑step pipeline.

Key technical bet : Rex decouples policy from code; deepsec relies on the strongest LLMs and multi‑agent cross‑validation.

Both acknowledge that a single agent cannot be trusted; the emerging pattern is “constrain agents + use agents as security researchers”. The author predicts that in the next year, tools that both bound agents and employ them for security work will become standard.

Conclusion

Rex offers an elegant Cedar + Rhai configuration for teams ready to run agents in production; its current operation set is limited to basic file‑system actions, and community extensions are awaited.

deepsec is expensive and compute‑heavy but proves valuable for medium‑to‑large codebases, especially in auth, data‑layer, and backend services, where a few thousand dollars can uncover critical true‑positive vulnerabilities.

The shared insight: Agent security tooling will explode soon, making “constraining agents” and “using agents as security analysts” complementary standards.

deepsec 工作台
deepsec 工作台
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI agentsSecurityCedarpolicy enforcementdeepsecRexRhai
Old Zhang's AI Learning
Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.