Turning AI Agents into Deliverable Workflows: Skills, Shell, and Compaction Explained

The article explains why writing code alone does not guarantee delivery, outlines three core challenges for long‑running agents—process reuse, execution, and context continuity—and presents a practical framework of Skills, Shell, and Compaction together with ten actionable recommendations, security guidelines, and implementation steps for teams.

Architect
Architect
Architect
Turning AI Agents into Deliverable Workflows: Skills, Shell, and Compaction Explained

Recent experiments with OpenClaw and Claude Code confirm that merely writing code does not ensure successful delivery; the real difficulty lies in constraining uncertainty, making processes auditable, and limiting damage when failures occur.

Three Hard Problems for Long‑Running Agents

OpenAI’s blog post "Shell + Skills + Compaction: Tips for long‑running agents that do real work" identifies three fundamental issues:

Process reuse – how to package repeatable workflows.

Execution – how to run scripts, modify files, and produce artifacts reliably.

Context continuity – how to keep long conversations and state without exploding the prompt.

The solution is expressed as three engineering‑style building blocks:

Skills – reusable, versioned workflow packages.

Shell – a controlled execution environment (local or hosted).

Compaction – automatic history compression to preserve continuity.

Quick‑Start Tips (8‑Item Cheat Sheet)

Skills = programmatic procedures : package SOPs, templates, examples, and boundary conditions as Skills; avoid massive system prompts.

Write Skill descriptions as routing rules : clearly state when to use, when not to use, and expected outputs.

Negative examples rescue reliability : adding "when not to use" scenarios restores trigger rates after an initial drop.

Templates and examples inside Skills cost almost nothing : they are loaded only on demand, keeping context size low.

Design continuity from the start : reuse containers, retain intermediate results, and pass previous_response_id to continue the same thread.

State determinism explicitly : use Use the "<skill name>" skill. to turn implicit routing into a contract.

Skills + network access are high‑risk : enforce strict allow‑lists and default‑deny network calls.

Make /mnt/data the delivery boundary : write all artifacts to disk so they can be reviewed, compared, and replayed outside the chat bubble.

Mind Model: Three Links in the Chain

Procedure (Skills) : reusable, versioned processes that define what to do.

Execution (Shell) : installs dependencies, runs scripts, writes files, and returns structured outputs.

Continuity (Compaction) : compresses long histories automatically or via the /responses/compact endpoint.

Combining these three replaces fragile prompt engineering with a stable "process + execution + continuity" pipeline.

Three Combination Modes

Mode A – Install → Fetch → Write Artifact (minimal closed loop)

Install dependencies.

Fetch or call external data.

Write a concrete artifact (e.g., /mnt/data/report.md).

This creates a clean review boundary for downstream verification.

Mode B – Skills + Shell (make a successful run reusable)

Encode the workflow, guardrails, and templates into a Skill.

Mount the Skill into the execution environment.

Let the agent produce predictable files and reports via the Skill.

Ideal for tabular analysis, data cleaning, periodic reports, and repeatable troubleshooting.

Mode C – Skills as Enterprise Workflow Carrier (advanced)

Example: a Salesforce‑driven Skill raised evaluation accuracy from 73 % to >85 % and cut TTFT by 18 % through precise routing, negative‑example enrichment, and embedded templates.

Assist layer : Agent only assists with code writing; Skills and Shell are not involved.

Collaboration layer : Skills orchestrate the workflow, Shell executes, and humans validate step‑by‑step.

Autonomous layer : Skills drive end‑to‑end work; humans act as product managers, only reviewing and deciding.

Eventually Skills become living SOPs that evolve with the organization while the execution remains in the agent.

Team Implementation Checklist (7 Steps)

Define the three most common deliverables (report, table, code patch, ticket summary) and standardise their storage path and format.

Create a Skill for each deliverable, including template, example output, negative cases, and boundary conditions.

Force the agent to use the corresponding Skill instead of relying on implicit routing.

Write every artifact to /mnt/data so downstream systems can audit, diff, and replay.

When network access is required, start with the smallest allow‑list and inject secrets via domain_secrets (the model only sees placeholders like $API_KEY).

Version‑control Skills like SOPs, tracking changes and releases.

Only after the above is stable, enable automatic routing and more complex multi‑tool orchestration.

Security & Governance Details

Allow‑list design : two layers – organisation‑wide allow‑list (trusted domains) and request‑level network_policy (must be a subset of the organisation list).

Credential handling : never expose raw keys to the model; use domain_secrets to inject them only for allowed domains at request time.

Network access : default‑deny; enable only the minimal set required for a task.

Reference

[1] Shell + Skills + Compaction: Tips for long‑running agents that do real work: https://developers.openai.com/blog/skills-shell-tips

Diagrams

Diagram 1: Minimal closed loop
Diagram 1: Minimal closed loop
Diagram 2: Continuous long‑running task
Diagram 2: Continuous long‑running task
Diagram 3: Two‑layer network gate
Diagram 3: Two‑layer network gate
Copyright notice: Content sourced from the web for learning and research only. All rights belong to the original authors.
AI agentscompactionshellSkillsLong-Running Tasks
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.