Turning Static AI Agent Skills into Dynamic, Testable, Iterable Workflows
This article explains how to extend a static AI Agent Skill defined in SKILL.md with a dynamic Workflows layer, add trajectory evaluation for deterministic execution, and provides concrete JavaScript examples, directory structures, anti‑patterns, and a step‑by‑step validation checklist to make Skills runnable, testable, and iteratively improvable.
Problem
Static SKILL.md describes when and how to run a Skill, but execution relies on the Agent guessing each step. This leads to step skipping, goal drift, and limited parallelism.
Solution Overview
Introduce a dynamic execution layer with JavaScript Workflows stored under workflows/. The three‑layer model becomes:
L0 Standard layer (01): SKILL.md + references/
L1 Orchestration layer (05): SKILL.md + sub‑Skill contracts
L2 Dynamic layer (07): SKILL.md + workflows/ + trajectory EvalStatic knowledge stays in SKILL.md, orchestration logic lives in Workflows, and a trajectory Eval validates the execution process.
Skill Directory Extension
After running
plugins/frontend-team-toolkit/skill-engineering/bin/new-skill.sh wechat-review-pipelinethe following layout is created:
plugins/frontend-team-toolkit/skills/wechat-review-pipeline/
├── SKILL.md # static knowledge, triggers, contract
├── .skill-meta.json # enables workflows + trajectory_evals path
├── references/
│ └── output-contract.md
├── evals/
│ ├── evals.json # output Eval definitions
│ └── trajectory-evals.json # process Eval definitions
├── workflows/ # dynamic orchestration scripts
│ ├── README.md
│ ├── parallel-review.js
│ ├── conditional-route.js
│ └── weekly-regression.js
└── scripts/validate-output.shThe scaffolding script copies the workflows/ template and the trajectory‑Eval JSON automatically.
Declaring Workflows in SKILL.md
## Dynamic Workflows
This Skill includes the following dynamic orchestration scripts:
### workflows/parallel-review.js
- **Purpose**: Parallel review + image generation
- **Trigger**: User says “parallel review” or “review and generate image”
- **Input**: Path to article file
- **Output**: Aggregated review results
### workflows/conditional-route.js
- **Purpose**: Route based on article status
- **Trigger**: User says “review” and provides status
- **Input**: Article path + status (draft/final/published)
- **Output**: Review result for the selected flow
### workflows/weekly-regression.js
- **Purpose**: Periodic regression of the Skill
- **Trigger**: User says “periodic regression” or `/loop weekly`
- **Input**: Optional Skill name
- **Output**: Regression report
Execution:
- Claude automatically selects the appropriate workflow
- Users can explicitly specify: “use parallel-review workflow for review”Workflows Script Patterns
Basic Structure
// workflows/example.js
async function mainWorkflow(args) {
// Phase 1: prepare input
const input = validateInput(args);
// Phase 2: execute orchestration logic
const result = await orchestrate(input);
// Phase 3: return formatted output
return formatOutput(result);
}
module.exports = mainWorkflow(args);Key functions: runAgent() – spawns a sub‑agent Promise.all() – runs agents in parallel await – serial waiting args – user‑provided parameters
Serial Orchestration Example
// workflows/serial-review.js
async function serialReview(articlePath) {
// Phase 1: review article
const reviewResult = await runAgent({
name: "article-reviewer",
prompt: `Read ${articlePath}, output a five‑dimensional score report`,
tools: ["Read"],
model: "sonnet",
worktree: false // shared session
});
// Phase 2: generate image (depends on Phase 1)
const imageResult = await runAgent({
name: "image-reviewer",
prompt: `Read ${articlePath}, output an image compliance checklist`,
tools: ["Read"],
model: "haiku",
worktree: true // isolated execution
});
// Phase 3: aggregate results
return {
review: reviewResult,
images: imageResult,
summary: `Review score: ${reviewResult.score}, Image compliance: ${imageResult.status}`
};
}
module.exports = serialReview(args.articlePath);Key points: await guarantees order; Phase 2 depends on Phase 1; worktree:true isolates the second agent.
Parallel Orchestration Example
// workflows/parallel-review.js
async function parallelReview(articlePath) {
const [reviewResult, imageResult] = await Promise.all([
runAgent({
name: "article-reviewer",
prompt: `Read ${articlePath}, output a five‑dimensional score report`,
tools: ["Read"],
model: "sonnet",
worktree: true // isolated to avoid conflicts
}),
runAgent({
name: "image-reviewer",
prompt: `Read ${articlePath}, output an image compliance checklist`,
tools: ["Read"],
model: "haiku",
worktree: true
})
]);
// Barrier: wait for both agents then synthesize
return synthesizeResults([reviewResult, imageResult]);
}
function synthesizeResults(results) {
return {
reviewScore: results[0].score,
imageStatus: results[1].status,
combinedStatus: results[0].score >= 9.0 && results[1].status === "pass" ? "PASS" : "FAIL"
};
}
module.exports = parallelReview(args.articlePath);Key points: Promise.all implements parallelism; worktree:true prevents resource contention; final synthesis acts as a barrier.
Conditional Routing Example
// workflows/conditional-route.js
async function conditionalRoute(articlePath, articleStatus) {
let agentConfig;
if (articleStatus === "draft") {
agentConfig = {name: "article-reviewer", model: "sonnet"};
} else if (articleStatus === "final") {
agentConfig = {name: "compliance-checker", model: "sonnet"};
} else if (articleStatus === "published") {
agentConfig = {name: "update-optimizer", model: "haiku"};
} else {
// default route
agentConfig = {name: "article-reviewer", model: "sonnet"};
}
const result = await runAgent({
...agentConfig,
prompt: `Read ${articlePath}, perform the appropriate review`,
tools: ["Read"],
worktree: false
});
return result;
}
module.exports = conditionalRoute(args.articlePath, args.status);Key points: plain if/else implements routing; classification happens first, then the selected agent runs.
Loop‑Until‑Done Example
// workflows/loop-regression.js
async function loopRegression(skill, maxIterations = 10) {
let iteration = 0;
let allPassed = false;
while (iteration < maxIterations && !allPassed) {
const results = await runAgent({
name: "eval-runner",
prompt: `Run full Eval for ${skill}`,
tools: ["Read", "Bash"],
model: "sonnet"
});
allPassed = results.every(r => r.pass === true);
if (!allPassed) {
console.log(`Iteration ${iteration}: ${results.filter(r => !r.pass).length} failures`);
}
iteration++;
if (!allPassed && iteration < maxIterations) {
await sleep(60000); // wait 1 minute
}
}
return {iterations: iteration, finalStatus: allPassed ? "ALL_PASS" : "STILL_HAS_FAILURES"};
}
module.exports = loopRegression(args.skill || "wechat-article-review");Key points: while loop with a maximum‑iteration guard; after each run it checks pass flags; optional sleep between attempts.
Trajectory Eval – Verifying Workflows Execution
Trajectory Eval checks that every runAgent call occurs, respects the intended order, and that parallel branches are all spawned.
Serial Order Verification Contract
{
"id": "workflow-serial-001",
"name": "serial-workflow-order-check",
"type": "regression",
"prompt": "Use workflows/serial-review.js to review articles/demo.md",
"expected": [
"must runAgent article-reviewer first",
"must runAgent image-reviewer second",
"must output aggregated result",
"agents must be serial (await)"
],
"must_not": [
"must not skip article-reviewer",
"must not run agents in parallel"
],
"grader": "trajectory",
"risk": "high",
"source": "workflow_contract"
}Parallel Completeness Contract
{
"id": "workflow-parallel-001",
"name": "parallel-workflow-both-agents",
"type": "regression",
"prompt": "Use workflows/parallel-review.js to review articles/demo.md",
"expected": [
"must spawn article-reviewer agent",
"must spawn image-reviewer agent",
"must use Promise.all for parallelism",
"must synthesize both results"
],
"must_not": [
"must not spawn only one agent",
"must not execute serially"
],
"grader": "trajectory",
"risk": "high",
"source": "workflow_contract"
}Conditional Routing Contract
{
"id": "workflow-conditional-001",
"name": "conditional-workflow-route-check",
"type": "regression",
"prompt": "Use workflows/conditional-route.js on articles/demo.md with status=draft",
"expected": [
"must route to article-reviewer",
"must not route to compliance-checker or update-optimizer"
],
"must_not": [
"must not route to wrong agent",
"must not skip classification"
],
"grader": "trajectory",
"risk": "medium",
"source": "workflow_contract"
}Data sources for trajectory Eval:
Agent trace – Claude Code conversation log recording each runAgent call
Workflows output – result structure returned by the script
Skill usage observation – self‑reported execution record
Anti‑Patterns and Correct Practices
Workflows replace SKILL.md – leads to missing trigger knowledge and contracts. Correct: keep SKILL.md for triggers, contracts, and static description.
SKILL.md contains execution code – turns the knowledge document into non‑deterministic code. Correct: SKILL.md describes steps; Workflows implement them.
No trajectory Eval – cannot verify that Workflows really ran. Correct: pair output Eval with trajectory Eval.
Workflows not saved to the Skill – makes them unreusable. Correct: store scripts under workflows/.
Practical Integration Checklist
Create a Skill package with new-skill.sh – scaffolds workflows/ and trajectory Eval JSON.
Write SKILL.md with static knowledge, trigger conditions, contracts, and a “Dynamic Workflows” section that lists the scripts.
Implement the Workflows scripts (serial, parallel, conditional, loop) under workflows/.
Add a trajectory Eval entry for each script in evals/trajectory-evals.json.
Validate the package structure:
python3 plugins/frontend-team-toolkit/skill-engineering/bin/validate-skill.py plugins/frontend-team-toolkit/skills/wechat-review-pipelineRun local CI simulation (merges output and trajectory Eval):
python3 plugins/frontend-team-toolkit/skill-engineering/scripts/run_evals.py \
--mode pr --skill wechat-review-pipeline \
--skill-base-path plugins/frontend-team-toolkit/skillsExecute the workflow with Claude Code runtime and inspect the trace:
claude -p "Use parallel-review workflow to review articles/demo.md"
claude --show-traceWhen SKILL_EXECUTION_MODE=local (default), the trajectory Eval generates a simulated agent_trace. For real verification set SKILL_EXECUTION_MODE=claude_code and use claude --show-trace.
Division of Responsibilities
SKILL.md – defines triggers, input/output contracts, and static workflow description.
Workflows – implements runtime orchestration, runAgent calls, parallel/serial decisions, conditional routing, and loop termination.
trajectory Eval – validates the execution process (order, completeness, routing, loop count). Output Eval validates the final result.
FAQ
Must Workflows be saved to the Skill? Recommended for reuse and distribution.
How do SKILL.md and Workflows divide work? SKILL.md provides knowledge, triggers, and contracts; Workflows provide the executable logic.
How many Eval files does a Workflow need? At least one trajectory Eval per Workflow, plus the usual output Eval.
How to verify parallel Workflows? Trajectory Eval checks that all agents are spawned.
What is the relationship between Workflows and CI? CI runs Workflows; Workflows can also trigger CI jobs.
Can Workflows call other Workflows? Yes, via nested runAgent calls.
Key Takeaways
Static knowledge lives in SKILL.md; dynamic orchestration lives in workflows/; process verification uses trajectory Eval.
Use a dual‑Eval strategy: output Eval for results, trajectory Eval for execution correctness.
Adding workflows/ on top of the base blueprint creates the L2 dynamic layer, turning static Skills into runnable programs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Frontend AI Walk
Looking for a one‑stop platform that deeply merges frontend development with AI? This community focuses on intelligent frontend tech, offering cutting‑edge insights, practical implementation experience, toolchain innovations, and rich content to help developers quickly break through in the AI‑driven frontend era.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
