40 min read

How “Skills” Turn LLM Prompts into Portable, Engineered Workflows

This article dissects the evolution of LLM prompts into structured, version‑controlled skill packages, explains the AgentSkills specification, details OpenClaw’s implementation, compares prompts, memory, MCP and skills, and provides end‑to‑end examples with code, flowcharts and best‑practice recommendations.

Baobao Algorithm Notes

Mar 2, 2026

How “Skills” Turn LLM Prompts into Portable, Engineered Workflows

1. From Prompt to Scripts: A Layered Taxonomy

Prompt : The input context for a single model inference, typically comprising system instructions, user input, conversation history, tool definitions and optional external retrieval snippets. It is not a single string but a structured collection assembled at runtime.

Best practice : Treat the prompt as an "input contract" – clearly state role, goal, constraints and output format, and separate mutable evidence from immutable policy to reduce coupling.

Common pitfalls : Overloading the prompt with excessive background, or asking the model for authoritative statements without evidence, which leads to hallucination.

1.2 Normal Prompt

Natural‑language instructions that are human‑readable but loosely constrained. Suitable for quick exploration, creative generation or informal discussion.

Best practice : Use the four‑part template task + audience + constraints + output format to minimise ambiguity.

Common pitfalls : Writing a single long prompt that tries to encode the entire workflow, causing token bloat and loss of control.

1.3 Structured Prompt

Prompts that embed explicit schemas (JSON‑Schema or similar) to turn parts of the problem into an API call. The model selects a tool, supplies validated parameters, and receives a deterministic result.

Best practice : Apply schemas only where strict validation is needed, keep them backward‑compatible, and keep reference material short and searchable.

Common pitfalls : Assuming that providing a schema guarantees correct JSON output; streaming responses can produce partial or malformed JSON.

1.4 Command

Slash‑style commands (e.g., /status, /exec) that bypass natural‑language reasoning and directly drive control flow. In OpenClaw they are parsed by the gateway before reaching the model.

Best practice : Treat commands as a control‑plane API – enforce allow‑list policies, audit every execution, and default to deny‑by‑default.

Common pitfalls : Exposing sensitive information or permissions through commands, or assuming successful execution without checking side‑effects.

1.5 Metadata Description

Compact YAML front‑matter in SKILL.md that provides name and description. The description acts as a searchable trigger; metadata can declare OS, binary, config dependencies and gating rules.

Discovery : Load only name/description at startup.

Filtering & Governance : Skip skills whose required binaries or env vars are missing.

Security & Permissions : Use flags like disable-model-invocation or command-dispatch: tool to enforce deterministic execution paths.

1.6 Reference

Optional supplemental files (Markdown, forms, API docs) that are loaded on demand. They keep the core skill lightweight while providing detailed guidance when needed.

Best practice : Store references in a reference/ folder, keep each file short, version‑controlled and searchable.

1.7 Scripts

Executable code (Python, Bash, JS) that performs deterministic steps such as parsing, validation or batch processing. Scripts are executed, not loaded into the LLM context.

Best practice : Validate inputs, run with minimal privileges, sign and audit scripts, and treat them as supply‑chain dependencies.

Common pitfalls : Treating scripts as "stronger prompts", allowing unrestricted execution, or using opaque curl | bash installers.

2. Skill: Modern Definition, Components and Runtime

2.1 Definition (2026 context)

A Skill is a portable, engineered capability package consisting of discoverable metadata, executable instructions, optional resources (references, assets, scripts), progressive loading strategy, and optional security/environment gating.

2.2 Component Breakdown

Interface : Discovery interface ( name/description) and execution interface (body, required tools, resources).

Schema : JSON‑Schema for tool calls or for the skill’s own input parameters.

Capability Description : The description field that drives retrieval quality.

I/O : User intent + extracted slots as input; natural‑language answer + optional structured artifacts as output.

State/Context : Session‑level state (retrieved evidence, generated artifacts, command results) stored in the agent’s state store.

Error Handling : Explicit recovery steps, logging of stderr, retry policies, and fallback strategies.

2.3 Execution Flow

flowchart TD
  A[Discover: Scan skill directories] --> B[Index: Load name/description]
  B --> C{Match user intent?}
  C -- No --> D[Continue normal dialog]
  C -- Yes --> E[Activate: Load full SKILL.md]
  E --> F{Need refs/assets?}
  F -- Yes --> G[Load refs/assets on demand]
  F -- No --> H[Execute: Enter agent loop]
  G --> H
  H --> I{Need scripts/tool calls?}
  I -- Tool call --> J[Tool call → Execute → Return]
  I -- Script --> K[Run script → Capture output]
  J --> H
  K --> H
  H --> L[Integrate results]
  L --> M[Respond to user]

This mirrors the AgentSkills three‑layer progressive disclosure (metadata → body → resources) and OpenClaw’s load‑filter‑inject‑execute‑feedback cycle.

2.4 Interaction Modes with LLMs

Synchronous vs Asynchronous : Immediate tool execution vs background jobs managed by OpenClaw’s exec and process utilities.

Streaming vs Batch : Streaming returns partial tool calls as they are generated; batch plans all calls first and then executes them together.

Relation to MCP : Model Context Protocol standardises the JSON‑RPC bridge between LLMs and external tools, while Skills encapsulate the procedural SOP that uses those tools.

2.5 Minimal Skill Engine (Pseudocode)

class SkillIndexItem:
    name: str
    description: str
    path: str
    metadata: dict

class SkillEngine:
    def discover(self, skills_dirs) -> list[SkillIndexItem]:
        items = []
        for d in skills_dirs:
            for skill_dir in list_subdirs(d):
                if exists(skill_dir + "/SKILL.md"):
                    fm = parse_yaml_frontmatter(skill_dir + "/SKILL.md")
                    items.append(SkillIndexItem(
                        name=fm.name,
                        description=fm.description,
                        metadata=fm.metadata,
                        path=skill_dir))
        return items

    def eligible(self, item, runtime_env) -> bool:
        # OpenClaw style gating based on OS/bins/env/config
        return satisfies(item.metadata, runtime_env)

    def match(self, user_msg, index_items) -> SkillIndexItem:
        # Simple embedding/keyword match; production should add poisoning protection
        return best_semantic_match(user_msg, index_items)

    def run(self, user_msg):
        index = [i for i in self.discover(dirs) if self.eligible(i, env)]
        skill = self.match(user_msg, index)
        if not skill:
            return llm_generate(normal_prompt(user_msg, index))
        full = read_file(skill.path + "/SKILL.md")
        prompt = build_prompt(user_msg, index, full)
        while True:
            out = llm_generate(prompt, stream=True)
            if out.is_tool_call:
                result = execute_tool(out.tool_name, out.args)
                prompt.append(tool_result(out.call_id, result))
            else:
                return out.final_text

3. Mechanism Comparison: Prompt / Memory / MCP / Skills

Prompts are transient, low‑persistence, low‑programming; Memory adds cross‑turn state and higher persistence; MCP provides a standardized JSON‑RPC bridge for tools; Skills combine versioned SOPs, scripts and assets, offering high persistence, high programmability and strong composability.

4. Framework Evolution: From LLM to OpenClaw

LLM‑only: single‑turn generation, low control, hard to reproduce.

Tool calling: adds deterministic external calls, still requires careful schema design.

Agent loops: multi‑step reasoning, higher state, risk of drift.

Multi‑agent collaboration: parallelism and role division, higher coordination cost.

Enhanced memory agents: long‑term storage and retrieval, privacy & contamination concerns.

Workflow/Manus engines: explicit graph‑based orchestration, high observability, higher engineering effort.

OpenClaw: a self‑hosted, always‑on agent platform with a skill marketplace, command gateway and sandboxed execution, offering strong productivity but introducing substantial supply‑chain and permission‑management risks.

5. End‑to‑End Examples

5.1 Example 1 – Technical Research & Evidence Aggregation

Goal : When a user asks for recent OpenClaw skill security incidents, automatically trigger the web‑research skill, which searches the web, fetches pages, deduplicates sources, extracts claims and outputs a citation‑rich report.

Skill layout :

web-research/
├── SKILL.md
├── references/
│   ├── source_policy.md
│   └── output_template.md
└── scripts/
    ├── extract_claims.py
    └── dedupe_sources.py

Core SKILL.md excerpt :

---
name: web-research
description: When the user asks for "latest/near‑term/comparison/citations", perform evidence‑driven research and output traceable references.
---
## Rules
1) Search first, then conclude.
2) Every claim must have a source.
3) Conflicting sources are listed side‑by‑side with uncertainty notes.
## Procedure
1) search_web → candidates
2) fetch_url → full text
3) scripts/dedupe_sources.py → unique set
4) scripts/extract_claims.py → claim‑source pairs
5) Render final report using the template.

Tool definitions (JSON‑Schema) :

[
  {
    "name": "search_web",
    "description": "Search the latest web pages and return [{title,url,snippet,date}]",
    "parameters": {
      "type": "object",
      "properties": {"q": {"type": "string"}},
      "required": ["q"]
    }
  },
  {
    "name": "fetch_url",
    "description": "Fetch a URL and return {text,meta}",
    "parameters": {
      "type": "object",
      "properties": {"url": {"type": "string"}},
      "required": ["url"]
    }
  },
  {
    "name": "run_script",
    "description": "Execute a script in the sandbox and return stdout/stderr/exit_code",
    "parameters": {
      "type": "object",
      "properties": {"path": {"type": "string"}, "args": {"type": "array"}},
      "required": ["path"]
    }
  }
]

Sequence diagram (simplified):

sequenceDiagram
  participant U as User
  participant A as Agent Runtime
  participant M as LLM
  participant W as Web Tools
  participant S as Sandbox (scripts)
  U->>A: "Summarise recent OpenClaw skill security events"
  A->>A: match skill index → activate web‑research
  A->>M: prompt + SKILL.md
  M-->>A: tool_call search_web(q)
  A->>W: search_web
  W-->>A: results[]
  A->>M: tool_result
  M-->>A: tool_call fetch_url(url) xN
  A->>W: fetch_url xN
  W-->>A: pages{text} xN
  A->>S: run_script dedupe_sources.py
  S-->>A: deduped list
  A->>S: run_script extract_claims.py
  S-->>A: claims+citations
  A->>M: aggregated tool results
  M-->>A: final report (claims + sources)
  A-->>U: response

5.2 Example 2 – Multi‑Step Release‑Notes Automation

Goal : A slash command /release-notes repo=. since="7 days" triggers a deterministic pipeline that extracts git logs, renders them with a template, writes the file and returns a summary.

Skill layout :

release-notes/
├── SKILL.md
├── references/
│   └── notes_template.md
└── scripts/
    ├── git_log.sh
    └── render_notes.py

SKILL.md front‑matter (OpenClaw style) :

---
name: release-notes
description: Generate release notes from a repository. Triggered by "release notes" intent or the /release-notes command.
user-invocable: true
command-dispatch: tool
command-tool: exec
command-arg-mode: raw
---

Command example :

/release-notes repo=. since="7 days"

End‑to‑end sequence diagram :

sequenceDiagram
  participant U as User
  participant G as OpenClaw Gateway
  participant X as Exec Tool (sandbox)
  participant P as Process Tool
  participant F as Filesystem Tools
  U->>G: /release-notes repo=. since="7 days"
  G->>G: parse command, authorise
  G->>X: exec("scripts/git_log.sh --since '7 days'")
  X-->>G: stdout (commit list) + exit_code
  alt long running
    G->>P: track background session
    P-->>G: status/result
  end
  G->>X: exec("python scripts/render_notes.py --template references/notes_template.md")
  X-->>G: release_notes.md content
  G->>F: write("release_notes.md", content)
  F-->>G: ok
  G-->>U: summary + file path

Script snippets (illustrative):

# scripts/git_log.sh
git log --since="$1" --pretty=format:"- %s (%h) by %an"

# scripts/render_notes.py (pseudo‑code)
import sys
commits = sys.stdin.read().splitlines()
template = open("references/notes_template.md").read()
notes = template.replace("{{COMMITS}}", "
".join(commits))
print(notes)

6. Conclusions and Future Work

6.1 Conclusions

In the 2026 landscape, a “Skill” is best described as a versioned, directory‑based SOP package that combines discoverable metadata, executable scripts and optional reference material, loaded progressively to keep token cost low. Skills sit alongside tool‑calling, MCP and memory to form a full‑stack agent engineering stack: MCP handles connectivity, tool calls handle deterministic parameterised actions, Skills provide reusable procedural knowledge, and Memory preserves cross‑turn context.

OpenClaw exemplifies a concrete platform that materialises this stack, offering a self‑hosted agent runtime, a skill marketplace and a command gateway. Its strengths are high productivity and real‑world applicability; its challenges are supply‑chain security, permission management and the need for rigorous governance.

6.2 Research & Implementation Checklist

Introduce CI/CD for Skills: versioning, changelogs, lint/validation (e.g., skills‑ref validate) and regression testing.

Treat metadata (especially description) as a first‑class security artifact: enforce length limits, avoid keyword stuffing, and add poisoning‑defence filters.

Apply least‑privilege and layered isolation: deny‑by‑default tool policies, sandbox high‑risk tools, and restrict credential injection to minimal, session‑scoped tokens.

Govern Scripts as supply‑chain dependencies: require signatures, lock versions, prohibit opaque installers like curl | bash, and generate SBOMs.

Prefer deterministic externalisation: use scripts for parsing, validation, diffing, and patching rather than relying on the model to simulate these steps.

Combine MCP with Skills: expose tools via a private MCP server, store organisational Skills in a private registry, and avoid direct public market ingestion.

Adopt a staged, verifiable adoption of OpenClaw: start in an isolated sandbox, enable only a curated set of internal Skills, block third‑party Skills by default, and continuously monitor context size and permission changes via /context detail ‑style commands.

Consider alternative stacks for enterprise‑grade automation: Agents SDK + LangGraph workflows + private MCP + internal Skill registry, which offers a smaller attack surface while retaining OpenClaw‑like capabilities.

7. Final Thoughts

The progression from raw prompts to engineered Skills mirrors the broader shift from ad‑hoc LLM usage to disciplined, reproducible AI‑augmented workflows. By treating Skills as first‑class software artifacts—complete with version control, testing and security reviews—teams can harness the power of large models while maintaining the rigor required for production environments.

Automation LLM prompt engineering Tool Calling Agent Skills OpenClaw

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1. From Prompt to Scripts: A Layered Taxonomy

1.2 Normal Prompt

1.3 Structured Prompt

1.4 Command

1.5 Metadata Description

1.6 Reference

1.7 Scripts

2. Skill: Modern Definition, Components and Runtime

2.1 Definition (2026 context)

2.2 Component Breakdown

2.3 Execution Flow

2.4 Interaction Modes with LLMs

2.5 Minimal Skill Engine (Pseudocode)

3. Mechanism Comparison: Prompt / Memory / MCP / Skills

4. Framework Evolution: From LLM to OpenClaw

5. End‑to‑End Examples

5.1 Example 1 – Technical Research & Evidence Aggregation

5.2 Example 2 – Multi‑Step Release‑Notes Automation

6. Conclusions and Future Work

6.1 Conclusions

6.2 Research & Implementation Checklist

7. Final Thoughts

Baobao Algorithm Notes

How this landed with the community

Was this worth your time?

0 Comments

3. Mechanism Comparison: Prompt / Memory / MCP / Skills

5.1 Example 1 – Technical Research & Evidence Aggregation

5.2 Example 2 – Multi‑Step Release‑Notes Automation