Information Security 15 min read

Claude Code Now Detects Security Flaws While You Write: Anthropic’s Three‑Layer Security‑Guidance Plugin

Anthropic’s security‑guidance plugin adds three progressive layers of automated security checks—instant string‑pattern matching, end‑of‑turn diff review, and deep commit‑time analysis—to Claude Code, letting the AI catch and fix common vulnerabilities as you code without blocking your workflow.

Code Mala Tang

May 26, 2026

Claude Code Now Detects Security Flaws While You Write: Anthropic’s Three‑Layer Security‑Guidance Plugin

Why a New Security Layer Is Needed

Repeated low‑level bugs such as directly concatenating user input into SQL, missing admin route checks, or unsafe token comparisons have traditionally been caught only during manual PR reviews. Anthropic’s official security-guidance plugin moves this defense line forward, letting Claude Code spot and fix issues while the code is being written.

How the Plugin Works

The plugin is not a static scanner. It launches a second Claude instance that watches the primary Claude’s edits, evaluates the changes, and feeds any findings back into the same conversation so the original Claude can repair the code.

Three‑Layer Checks

Layer 1 – Per‑File Edit : After each file edit the plugin runs a set of predefined dangerous‑pattern string or regex matches. This layer incurs no model cost and catches patterns such as eval(, new Function, os.system, child_process.exec, unsafe deserialization with pickle, DOM injection via dangerouslySetInnerHTML, .innerHTML, document.write, and changes under .github/workflows/. Each pattern triggers only once per file per session to avoid noise.

Layer 2 – End‑of‑Turn Diff Review : When a turn (a full user‑assistant exchange) ends, the plugin computes a git diff of all changes—including edits, Bash commands, or sub‑agents—and sends it to an independent Claude instance for a security‑focused review. This backend review catches issues that simple string matching misses, such as authorization bypass, insecure direct object references (IDOR), various injections, SSRF, and weak encryption. Limits: up to 30 changed files per turn, max three triggers per turn.

Layer 3 – Commit/Push Review : When Claude’s Bash tool executes git commit or git push, the plugin launches a deeper agentic review. Unlike the previous layers, this reviewer reads related code (callers, sanitizers, config files) to decide whether a flagged pattern is a real problem or a false positive. Only commits/pushes initiated by Claude’s own Bash tool trigger this layer; manual shell commands do not.

Reviewer Origin and Isolation

Layer 1 uses deterministic string matching with no model involvement. Layers 2 and 3 each invoke a fresh Claude instance with a security‑specific prompt, keeping the reviewer context separate from the authoring Claude. This role isolation prevents the same model from self‑scoring.

Installation and Runtime Requirements

The plugin requires Claude Code CLI 2.1.144+ and Python 3.8+. Install with:

/plugininstallsecurity-guidance@claude-plugins-official

If the marketplace is missing, add it first:

/plugin marketplace add anthropics/claude-plugins-official

After installation, activate with /reload-plugins. The first run creates a virtual environment under ~/.claude/security/ and installs the Claude Agent SDK. On Windows the virtual‑env step is skipped; the commit review falls back to a single‑shot mode unless the SDK is already importable.

Customizing Rules for Your Project

Default patterns are generic, but projects can add their own “security guidance” in natural language:

- Do not log customer_id at INFO level or higher.
- All routes under /admin must call require_role("admin") before DB access.
- Use crypto.timingSafeEqual for token comparison, not ===.

These hints are loaded from .claude/claude-security-guidance.md (project‑level) or ~/.claude/claude-security-guidance.md (user‑level). The plugin also supports custom pattern files .claude/security-patterns.yaml (or .json/.yml) with up to 50 rules, each defined by rule_name, reminder, and either regex or substrings, plus optional paths globs. The combined guidance size is limited to 8 KB.

File‑search order (merged when all exist):

User‑level: ~/.claude/claude-security-guidance.md (shared across all projects on the machine)

Project‑level: .claude/claude-security-guidance.md (committed with the repo)

Project‑local: .claude/claude-security-guidance.local.md (git‑ignored, for personal overrides)

Enterprise admins can push the user‑level file via MDM to enforce organization‑wide rules.

How It Fits With Other Security Tools

During coding : the security-guidance plugin catches obvious flaws instantly.

On‑demand : the /security-review command runs a one‑off scan of the current branch.

Pull‑request stage : Claude’s multi‑agent Code Review adds full‑repo context for correctness and security.

CI pipelines : existing SAST and dependency scanners handle language‑specific rules, supply‑chain checks, and organizational policies.

Each layer is designed to catch what the previous one missed, with the plugin focusing on the earliest possible detection.

Cost and Disabling Options

Layer 1 is free (no model calls). Layers 2 and 3 consume model usage billed to your Claude quota, defaulting to Claude Opus 4.7. Model selection can be overridden with environment variables SECURITY_REVIEW_MODEL (end‑of‑turn) and SG_AGENTIC_MODEL (commit review).

Disabling individual layers is also possible via environment variables: ENABLE_PATTERN_RULES=0 – turn off string‑pattern matching. ENABLE_STOP_REVIEW=0 – disable end‑of‑turn diff review. ENABLE_COMMIT_REVIEW=0 – disable commit/push review. ENABLE_CODE_SECURITY_REVIEW=0 – disable all model‑based reviews. SECURITY_GUIDANCE_DISABLE=1 – completely deactivate the plugin without uninstalling.

Full removal uses /plugin uninstall security-guidance@claude-plugins-official. For project‑level settings, disabling writes an override to .claude/settings.local.json affecting only the current user.

Implementation Details

The plugin leverages Claude Code’s hook system, registering callbacks for: SessionStart – launches the Python environment. UserPromptSubmit – captures the workspace baseline for diff reviews. PostToolUse (Edit/Write/NotebookEdit) – runs the per‑file string matcher. Stop – triggers the background end‑of‑turn diff review. PostToolUse (Bash) – filters git commit / git push for the deep review.

The source code is publicly available at

https://github.com/anthropics/claude-plugins-official/tree/main/plugins/security-guidance

, providing a reference implementation for building similar hook‑based checks.

Practical Guidance for Developers

Enabling the plugin adds a zero‑cost first layer and non‑blocking later layers, reducing the volume of security issues that reach downstream PR or CI stages. Adding project‑specific guidance to .claude/claude-security-guidance.md ensures new clones inherit the rules.

AI‑coding‑tool builders can study the hook architecture to embed independent model reviews in tools like Cursor, Cline, or Aider.

Security and compliance groups can translate existing SAST rules into security-patterns.yaml so that violations surface instantly during coding, reducing false positives that later reach PR or CI stages.

The plugin is positioned as an early‑stage filter, not a replacement for downstream SAST or PR reviews; it should be part of a defense‑in‑depth strategy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Hooks information security static analysis AI coding assistant Anthropic Claude Code security-guidance

Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.