How to Make Agent Skills Continuously Evolve and Accumulate Experience
The article explains why many Agent Skills become stale after a single use and presents four concrete design principles—precise triggers, executable commands, systematic pitfall recording, and verifiable steps—plus a lifecycle and anti‑pattern guide to keep Skills up‑to‑date and reusable.
1. What Skill Stores
Agent persistence is divided into three distinct pools, each with its own purpose:
Memory : Stores stable facts such as user preferences, environment information, project conventions, and confirmed tool quirks (e.g., MEMORY.md / USER.md). Putting operational steps here means the Agent can see them but cannot know when to use them.
Session History : Records dialogue trajectories (e.g., state.db + FTS5). Storing a Skill inside history makes it hard to locate for reuse.
Skill : Contains reusable processes—trigger conditions, step commands, pitfall records, and verification methods (e.g., SKILL.md + references/ templates/ scripts/). Mixing preferences or facts into Skill adds irrelevant background to each load.
One‑sentence summary of the three pools: Memory stores "what the user likes", History stores "how it was done last time", Skill stores "what to do next".
The sole responsibility of a Skill is to preserve a reusable operation flow, be loaded at the correct moment, and provide the correct execution path.
2. Four Design Principles for Accumulative Skills
Principle 1 – Precise Trigger
The description field is the primary cue for the Agent. A vague trigger such as "Use when deploying to production" leaves the Agent guessing (git push? CI pass? manual trigger?). A precise trigger specifies concrete scenarios, is self‑checking, includes exclusion conditions, and is updated whenever a missing scenario is discovered.
Example of a vague description: Use when deploying the application to production. Problems: the Agent cannot reliably determine what "deploying to production" means.
Example of a precise description:
Use when the user asks to deploy a Docker image to AWS ECS, configure a Kubernetes deployment, or push a tagged release to production infrastructure.Four features of a precise trigger:
Scenario‑specific, not abstract.
Self‑checking via keyword or intent matching.
Explicit exclusion clauses (e.g., "Do NOT use for browser JavaScript errors").
Iterative updates when new applicable scenarios are found.
Principle 2 – Executable Commands
Many Skills list strategic intents like "check logs, locate error, fix issue". The Agent can only understand the intent, not the concrete steps. Instead, each step must contain exact commands or tool invocations.
Strategy‑intent example:
1. Check the gateway logs to see what went wrong.
2. Identify the failing platform.
3. Restart the affected service.Executable version:
1. Run <code>grep -i 'failed to send\|error' ~/.hermes/logs/gateway.log | tail -50</code>.
2. If the error mentions a platform name (e.g., 'telegram', 'discord'), that platform is the failing one.
3. Run <code>hermes gateway restart</code> to restart the gateway service.Three levels of executability:
Minimum: Every step names a concrete tool or command (e.g., grep -i 'error' file.log).
Recommended: Include conditional branches so the Agent can choose actions without pausing to decide.
Ideal: A complete logical chain covering all possible branches, enabling near‑autonomous execution.
Principle 3 – Pitfalls as Core Experience
A Skill without a Pitfalls section is merely a "first‑time success video". Recording each failure transforms the Skill into a protective barrier.
Standard Pitfall entry format:
1. macOS grep does not support -P (Perl regex) Phenomenon: Running grep -P reports "invalid option -- P". Cause: The built‑in BSD grep on macOS lacks -P ; only GNU grep supports it. Fix: Use ggrep (brew install grep) or an ERE with grep -E . Avoid GNU‑only options for cross‑platform Skills. Discovery date: 2026‑01‑15
Pitfall value spans three time scales:
Immediate: Recorded during the same conversation, preventing repeat mistakes.
Short‑term: The next use (e.g., next month) sees the pitfall and avoids it.
Long‑term: New users benefit from the accumulated failure knowledge.
Principle 4 – Verifiable Steps
Without verification, a Skill may appear successful while actually failing (e.g., generating an empty report). Adding explicit checks lets the Agent detect silent failures and turn them into new pitfall entries.
Skill without verification: Final step: generate report file report.md. Skill with verification:
Verify: run <code>wc -l report.md</code> and confirm line count > 50 (non‑empty).
Then run <code>head -3 report.md</code> and confirm the first three lines contain date, title, and summary.Failed verification becomes a new pitfall (e.g., "report.md has only one line because the API returned empty results"). Successful verification confirms the Skill remains valid under current conditions.
3. Patch – The Core Action in the Skill Lifecycle
After understanding the four principles, the Skill lifecycle is not a one‑off write‑and‑done process. It consists of six stages:
① Create – Capture the first successful path as a Skill.
② Load – Agent recognizes the scenario and injects the Skill into the system prompt.
③ Execute – Follow the Skill steps and run verification.
④ Discover Gap – Detect failures, unsupported commands, or validation errors.
⑤ Patch – Add a pitfall, correct the command, or update the trigger condition.
⑥ Loop – Return to loading; the next execution uses the improved Skill.
The critical transition is ④→⑤ : discovering a gap and immediately patching it determines whether the Skill remains a one‑time script or becomes an evolving knowledge base. The best patches add a single pitfall entry and adjust the relevant step rather than rewriting the whole flow.
4. Anti‑Patterns – Five Types of Dead Skills
The most common five degenerated Skill styles all "run" but become forgotten after three months:
1. Documentation‑style Skill – Reads like a blog post with concepts and diagrams but lacks concrete commands or trigger conditions. Agents gain knowledge but no executable value.
2. Universal Skill – Covers many unrelated scenarios with a broad trigger (e.g., "Use when the user asks about deployment, testing, CI/CD, monitoring, or infrastructure"). The Skill is loaded too often, and most content is ignored.
3. One‑off Skill – Tied to a single task with fixed parameters (e.g., "Run operation Z on repo Y of project X"). After the task, the Skill loses relevance because it is not parameterized.
4. Happy‑Path‑Only Skill – Lists only the ideal steps, omitting error handling, environment differences, and pitfall sections. When something goes wrong, the Agent has no guidance.
5. Stale Skill – Well‑structured but untouched for months. No new pitfall entries, no command updates, no adaptation to tool or API changes. After three months, roughly 30 % of its steps are likely outdated.
Corresponding corrective actions:
Documentation → add executable commands.
Universal → split into focused Skills.
One‑off → parameterize key variables.
Happy‑Path → add a Pitfalls section.
Stale → use "run‑once, add‑pitfall, immediate patch" workflow.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Step-by-Step
Sharing AI knowledge, practical implementation records, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
