How a Three‑Tier Authorization Model Secures Agent Execution

The article details Hermes' multi‑layered permission system—hardline blocklists, pattern‑based dangerous command detection, persistent approval granularity, three approval modes (Manual, Smart, YOLO), file‑write protection rules, customizable shell hooks, tool‑guardrails for loops, and an ACP bridge—illustrating how each layer defends AI agents from destructive actions.

James' Growth Diary
James' Growth Diary
James' Growth Diary
How a Three‑Tier Authorization Model Secures Agent Execution

Hardline Blocklist

Hermes defines a non‑bypassable blocklist of catastrophic commands that have no recovery path. The patterns are compiled once at import time and include:

HARDLINE_PATTERNS = [
    (r'\brm\s+...\s*(/|/\*|/ \*)', "recursive delete of root"),
    (r'\bmkfs\b', "format filesystem"),
    (r'\bdd\b.*\bof=/dev/(sd|nvme...)', "dd to raw block device"),
    (r':\(\)\s*\{\s*:\s*\|\s*:\s*&\s*\}\s*;\s*:', "fork bomb"),
    (r'\bkill\s+...\s*-1\b', "kill all processes"),
    (_CMDPOS + r'(shutdown|reboot)', "system shutdown/reboot"),
]

Implementation details:

Command start positions are anchored with _CMDPOS to avoid false matches such as echo reboot or grep 'shutdown'.

Common prefixes ( sudo, env VAR=1, nohup) are normalised before matching.

Unicode normalisation removes ANSI escapes, full‑width characters and back‑slash escaping.

The regex list is compiled into HARDLINE_PATTERNS_COMPILED at module load to avoid per‑call recompilation.

Dangerous Patterns and Approval Granularities

Beyond the hardline list, Hermes matches ~47 dangerous patterns ( DANGEROUS_PATTERNS) covering filesystem, database, system, configuration‑tampering and code‑injection commands. When a command matches, an approval workflow is triggered with four possible actions:

once – allow this execution only once (e.g., rm -rf ./node_modules).

session – allow for the whole session (e.g., repeated rm -rf during a cleanup job).

always – permanently add to command_allowlist in config.yaml (e.g., git clean -fd).

deny – reject the command.

Approvals marked always are persisted under command_allowlist in config.yaml, eliminating repetitive prompts.

Approval Modes: Manual → Smart → YOLO

Manual (default)

choice = prompt_dangerous_approval(command, description, approval_callback=approval_callback)

In CLI mode the user sees an interactive prompt with the four actions. In gateway mode the prompt is sent to Telegram/Discord/Slack; the user replies with /approve, /deny or /approve all. The request blocks the Agent thread using threading.Event until a reply is received; concurrent sub‑agents queue FIFO.

Smart

def _smart_approve(command: str, description: str) -> str:
    prompt = f"""You are a security reviewer for an AI coding agent.

    Command: {command}
    Flagged reason: {description}

    Assess the ACTUAL risk of this command. Many flagged commands are false positives...

    APPROVE / DENY / ESCALATE"""
    response = call_llm(task="approval", ..., temperature=0, max_tokens=16)
    return answer  # "approve" / "deny" / "escalate"

Smart mode runs a secondary LLM (temperature 0) to re‑evaluate false positives such as python -c "print('hello')". Internal testing shows it automatically clears 60‑70 % of flagged commands, falling back to Manual approval when uncertain.

YOLO

Setting HERMES_YOLO_MODE=1 or issuing /yolo skips all second‑level approval prompts, but hardline blocklist entries remain enforced (e.g., disk formatting cannot be bypassed). The flag is frozen at module load in _YOLO_MODE_FROZEN to prevent runtime toggling via prompt injection.

File‑Write Protection (Three Rules)

Rule 1: Write‑Deny List

write_denied = {
    "~/.ssh/authorized_keys",   # SSH auth
    "~/.ssh/id_rsa",            # private key
    "~/.hermes/config.yaml",    # security policy (critical)
    "~/.hermes/.env",           # API keys
    "/etc/sudoers",            # sudo rights
    "/etc/passwd",             # system accounts
    "/etc/shadow",             # password hashes
    "~/.bashrc", "~/.zshrc",   # shell rc files
    "~/.netrc", "~/.git-credentials", # credential files
}

The config file also stores approvals.mode, yolo, and command_allowlist. Preventing the Agent from editing it closes a critical attack loop.

Rule 2: Cross‑Profile Write Protection

def classify_cross_profile_target(path: str) -> Optional[dict]:
    # Detect writes that cross from the active profile to another (e.g., security profile)
    return {"active_profile": ..., "target_profile": ..., "area": "skills"}

If a write targets another profile, Hermes issues a warning and requires explicit user confirmation. This is a soft guardrail, not a hard block.

Rule 3: Sandbox‑Mirror Protection

When Hermes runs inside Docker/Daytona, configuration paths are bind‑mounted into sandbox locations such as

.../sandboxes/<backend>/<task>/home/.hermes/config.yaml

. Writes affect only the container image, not the host. Hermes flags these as “sandbox‑mirror writes” and warns that the change does not modify the authoritative host configuration.

Shell Hooks: User‑Defined Security Scripts

Users can plug custom scripts via agent/shell_hooks.py. The workflow is defined in cli-config.yaml:

cli-config.yaml:
  hooks:
    pre_tool_call:
      terminal:
        - ~/scripts/security-check.py
        - ~/scripts/audit-logger.sh

Before each tool call, Hermes streams a JSON payload to the script:

{
  "hook_event_name": "pre_tool_call",
  "tool_name": "terminal",
  "tool_input": {"command": "kubectl delete pod my-pod"},
  "session_id": "sess_abc123",
  "cwd": "/home/user/project"
}

The script returns JSON indicating the action:

{"action": "block", "message": "kubectl delete is not allowed in this project"}

Implementation uses subprocess.run(shell=False) to avoid shell injection, supports one‑time allowlists, runs asynchronously, and accepts any language that reads JSON from stdin and writes JSON to stdout.

Tool Guardrails: Runtime Last Line of Defense

Hermes classifies tools as Idempotent (read‑only) or Mutating (side‑effects). It tracks three failure patterns:

Exact Failure : same tool, same args, same error → warn after ≥2 occurrences, hard‑stop after ≥5.

Same‑Tool Failure : same tool, different args, repeated errors → warn after ≥3, hard‑stop after ≥8.

No Progress : idempotent tool called repeatedly without state change → warn after ≥2, hard‑stop after ≥5.

The ToolCallGuardrailController hashes each call’s parameters, compares with the previous result, and invokes exact_failure(), same_tool_failure(), or idempotent_no_progress(). On hard‑stop it injects a synthetic result via synthetic_tool_failure to break the loop.

ACP Bridge: Mapping Standard Agent Protocols to Hermes Approvals

Hermes implements the Agent Communication Protocol (ACP) and translates its permission model to Hermes granularity: allow_once

once
allow_session

session
allow_always

always
deny

/ deny_alwaysdeny The adapter acp_adapter/edit_approval.py adds a pre‑write approval step for write_file and patch. It sends a diff (old vs. new) to the ACP client so the user can review the exact file change before it is applied.

Architecture Overview

The layered defense (from bottom to top) is:

┌──────────────────────────────────────────────┐
│  Third Layer: Tool Guardrails (tool_guardrails.py) │
│  → Runtime detection of loops, repeated failures │
├──────────────────────────────────────────────┤
│  Third Layer: File Write Protection (file_safety.py) │
│  → Write‑deny list / cross‑profile / sandbox‑mirror │
├──────────────────────────────────────────────┤
│  Third Layer: Shell Hooks (shell_hooks.py) │
│  → User‑defined security scripts │
├──────────────────────────────────────────────┤
│  Second Layer: Dangerous Command Approval (approval.py) │
│  → ~47 patterns + Manual/Smart/YOLO modes │
│  → Granularities: once / session / always │
├──────────────────────────────────────────────┤
│  First Layer: Hardline Blocklist (approval.py) │
│  → 12 catastrophic commands that cannot be bypassed │
└──────────────────────────────────────────────┘

The design embodies "security‑by‑design": approvals are baked into the tool engine rather than added as a UI overlay.

Conclusion

Hermes provides a multi‑tiered permission system that balances safety and usability:

Immutable hardline blocklist for commands with no recovery path.

Pattern‑based dangerous‑command detection with persistent approval granularity.

Smart LLM filtering that automatically clears the majority of false positives.

Optional YOLO mode that trusts the Agent for non‑catastrophic actions.

File‑write safeguards (deny list, cross‑profile checks, sandbox‑mirror warnings).

Pluggable shell hooks for custom security policies.

Runtime tool guardrails that break infinite loops and repeated failures.

ACP integration that extends the same approval model to standard agent protocols.

Together these mechanisms form a defense‑in‑depth framework for AI agents, ensuring that only explicitly approved operations can affect the host environment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

File protectionAgent securityYOLO modeHardline blocklistPermission approvalShell hooksSmart approval
James' Growth Diary
Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.