Artificial Intelligence 12 min read

Why Your LLM Skill Gets Ignored and 5 Proven Design Patterns to Make Agents Work

Even after spending hours crafting a Skill, many LLM agents ignore it, leading to failed automation; this article analyzes why and presents five validated design patterns—linear flow, decision tree with lazy loading, iterative loops, baton passing, and multi‑stage checkpoints—plus concrete examples and a minimal Skill template to ensure reliable, production‑grade agent behavior.

ZhiKe AI

May 28, 2026

Why Your LLM Skill Gets Ignored and 5 Proven Design Patterns to Make Agents Work

What a Skill Actually Is

A standard Skill is a folder whose core file is SKILL.md (YAML front‑matter plus Markdown body). When an LLM decides a Skill is needed, it injects the entire file into the conversation context and then decides how to execute it using basic tools such as bash, read, and edit.

Why Design Quality Matters

The quality of your Skill design directly determines the LLM’s execution accuracy. Poorly structured or vague Skills cause the agent to stumble, requiring manual intervention and defeating the purpose of automation.

Typical Skill Directory Layout

SKILL.md

: required main file scripts/: optional executable scripts references/: optional detailed reference docs resources/: optional templates, checklists, etc. examples/: optional example implementations

Five Verified Design Patterns

① Linear Flow – for clearly ordered operations such as deployment or migration (e.g., OpenAI vercel‑deploy, 77 lines).

② Decision Tree + Lazy Loading – for large option sets where the LLM must pick one (e.g., OpenAI cloudflare‑deploy, 224 lines).

③ Iterative Loop – for repeat‑do‑verify‑improve cycles like TDD or code review (e.g., obra‑tdd‑skill, 371 lines).

④ Baton‑Pass Loop – for multi‑session, long‑term projects spanning days or weeks (e.g., Google Labs stitch‑loop, 203 lines).

⑤ Multi‑Stage + Checkpoints – for multi‑day processes that require Go/No‑Go decisions (e.g., Dean Peters discovery process, 502 lines).

Pattern‑Selection Decision Tree

你的 Skill 需要做什么？
├─ 执行有明确步骤的操作 → ① 线性流程
├─ 在大量选项中帮用户选择 → ② 决策树 + 按需加载
├─ 单次会话内反复"做→验证→改进" → ③ 循环迭代
├─ 跨多个 session 持续推进 → ④ 接力棒循环
├─ 跨越多天/多周，有阶段决策 → ⑤ 多阶段 + 检查点
└─ 需要深度分析而非快速执行 → 特殊模式：思维框架

Minimal Linear‑Flow Example (OpenAI Vercel Deploy)

---
description: Deploy my app
trigger: "deploy my app", "push this live"
 timing: "before writing implementation code"
products: ["Vercel"]
safety: "Always deploy as preview, not production"
---
## Prerequisites
- [ ] Node.js 18+
- [ ] Vercel CLI installed (`npm i -g vercel`)
- [ ] Project linked to Vercel (`vercel link`)

## Quick Start
### Step 1: Build the project
```bash
npm run build
```
**Timeout**: Use a 10‑minute timeout for large projects.

### Step 2: Deploy to preview
```bash
vercel --preview
```
⚠️ **Do not** use `--prod` unless explicitly requested.

### Step 3: Verify deployment
✅ Check the preview URL in the Vercel dashboard.
❌ Do **not** `curl` the deployed URL for verification; use the dashboard instead.

Key Technique Breakdown

Safety defaults : default to the safest option (e.g., preview deployment).

Explicit commands : each step provides a copy‑and‑paste bash command, leaving no guesswork for the LLM.

Negative directives : explicitly forbid risky actions (e.g., "Do not curl", "Do not use --prod").

Timeout hints : include a concrete timeout (e.g., 10 minutes) to avoid mid‑process interruption.

Four "Weapons" to Stop LLM Laziness

Weapon 1 – Hard‑Tone Commands

Replace vague suggestions like "You might want to consider starting over..." with direct imperatives such as "Delete it. Start over." The obra TDD Skill uses this tone throughout and sees immediate compliance.

Weapon 2 – Pre‑emptive Rebuttal Table

Anticipate common LLM excuses and counter them explicitly:

"This test is good enough" → "Delete it. Rewrite. Coverage < 80% is not sufficient." "Refactoring can be done later" → "No. Do it now. Technical debt grows exponentially." "Skip this iteration" → "Red‑Green‑Refactor is a law; you cannot skip any step."

(The original analysis identified 12 typical excuses; the table above shows three representative examples.)

Weapon 3 – Quantitative Thresholds

Set hard minimum standards to avoid "good enough" shortcuts. Example from Trail of Bits: "Each function must have at least 3 invariants and 5 assumptions ; otherwise the review fails."

Weapon 4 – Negative Directives + Three Safety Principles

Safety defaults : choose the safest option by default (e.g., preview instead of production).

Least‑privilege : elevate permissions only when absolutely necessary.

Human fallback : when uncertain, defer to a human decision.

New‑comer Pitfall Checklist

Front‑matter too vague – "Helps with deployment stuff" will not trigger. Use explicit trigger phrases, timing, and product keywords.

Putting everything in one SKILL.md – exceeds the 10K token limit. Adopt a three‑layer architecture:

Layer 1: Front‑matter (~100 tokens) for quick LLM scanning.

Layer 2: Core SKILL.md (2K‑5K tokens) with commands and steps.

Layer 3: references/ and resources/ loaded on demand.

Over‑design – using multi‑stage orchestrators for simple tasks. Start with the minimal linear version and only add complexity when the business scenario demands it.

Action Plan: Build Your First Skill

Step 1 – Create the folder and SKILL.md (≈30 min) :

Create my-deploy-skill/ Add a SKILL.md based on the linear‑flow template (keep it under 77 lines).

Step 2 – Validate in a real scenario (≈1 h) :

Run the agent through the full business flow.

Check whether the Skill is invoked, whether steps match expectations, and where additional negative directives or thresholds are needed.

Step 3 – Incrementally add complex patterns (continuous) :

If you need multi‑platform support, add the decision‑tree pattern (②).

If you need a TDD loop, add the iterative pattern (③).

If the work spans days, add the baton‑pass pattern (④).

Remember: over‑design is a risk. Match Skill complexity to business complexity rather than to showcase cleverness.

Final Thought

The ultimate goal of Skill design is not to make the LLM "smarter" but to make it "more reliable". By following the five proven patterns and the four weapons above, you can turn a flaky prompt into a production‑grade agent that reliably gets the job done.

This article is rewritten from publicly available material in seven top‑tier Skill repositories (OpenAI, Google Labs, Trail of Bits, etc.). All cases and data are traceable to those sources.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

design patterns automation LLM Prompt Engineering Agent Skill

Written by

ZhiKe AI

We dissect AI-era technologies, tools, and trends with a hardcore perspective. Focused on large models, agents, MCP, function calling, and hands‑on AI development. No fluff, no hype—only actionable insights, source code, and practical ideas. Get a daily dose of intelligence to simplify tech and make efficiency tangible.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.