Why Codex, Claude Code, and Hermes All Adopt /goal: Turning Prompt Goals into Runtime Agent Interfaces
From late April to mid‑May, OpenAI Codex, Claude Code, and Hermes each introduced an explicit /goal capability that transforms a one‑sentence prompt into a managed runtime object, enabling long‑running agents to maintain state, validation, budget, and pause/resume control within the Agent Harness.
In a short half‑month window (late April to mid‑May), three AI coding tools—OpenAI Codex, Anthropic Claude Code, and Hermes—simultaneously introduced an explicit /goal command. Although each implementation differs, all aim to solve the same problem: long‑running agents cannot stop after a single turn and need a persistent, auditable, and controllable target.
Why the convergence on /goal ?
The need arises because a goal expressed only as natural language in a prompt lacks boundaries, state, verification, and the ability to be paused, resumed, or audited. As the author puts it, the goal evolves from a single prompt sentence into a runtime interface of the Agent Harness.
Background and prior work
The article references earlier posts that examined long‑task agents, the Ralph Loop, and the evolution of Skills into repository‑level work‑flow packages. Those works highlighted the importance of persistent artifacts such as GOAL.md, PROGRESS.md, Git history, test reports, and failure logs.
Public discussions that foreshadowed the change
Geoffrey Huntley described a self‑healing Ralph Loop that automatically fixes security failures and re‑validates.
Trey Goff advocated using the Ralph Loop on Codex hooks to prevent early exits.
Greg Brockman framed Codex CLI 0.128.0’s /goal as a “built‑in Ralph loop++”.
Tobi Lütke promoted “context engineering” – providing enough context (including goal, state, budget, and verification) for the LLM to solve the task.
Karpathy’s autoresearch project demonstrated a short‑duration, self‑experiment loop with measurable, roll‑backable outcomes.
All these discussions converge on the same engineering question: when an agent works for many turns, the system must clearly define goal, state, feedback, and budget.
Three implementations compared
Codex treats the goal as a long‑task control console. It persists the goal in a thread‑level state database and provides user commands such as /goal, /goal pause, /goal resume, and /goal clear. The model drives the task forward while the runtime tracks activation, pause, completion, token usage, time limits, and prevents runaway loops.
Claude Code implements a session‑scoped stop‑hook. After each turn, an independent evaluator examines the dialogue evidence and decides whether the goal is satisfied. If not, the task continues; if yes, execution stops. This design is lightweight but relies on the main model to embed sufficient verification evidence in the conversation.
Hermes stores the goal in SessionDB.state_meta, making it accessible across sessions and gateways (CLI, Telegram, Discord, etc.). It focuses on cross‑session continuity and fail‑open behavior: on errors the goal_judge fails open, while a turn‑budget and explicit reason enforce a hard stop.
All three share the same high‑level intent—turn a fleeting prompt into a managed runtime object—but differ in where they extract state, how they evaluate completion, and how they handle cross‑session continuity.
What a robust Goal must contain
The author lists five mandatory fields for a usable Goal:
Scope : precise directory, module, or user path.
Invariants : public APIs or data structures that must not change.
Verification evidence : concrete commands, logs, diffs, CI status that prove success.
Budget boundaries : maximum turns, time, or token consumption.
Closure output : summary of completed items, remaining failures, failed commands, assumptions, and next‑step suggestions.
A concrete example is provided:
/goal 修复 test/auth 下所有失败测试,并保持 src/auth 的 public API 不变。
范围:只改 src/auth、test/auth 和必要的测试辅助文件。
不变量:不删除现有断言,不改公共 API,不扩大到 unrelated tests。
验证:每轮修改后运行 npm test -- test/auth;完成前运行 lint,并报告 git diff 范围。
预算:20 turns 后仍未完成则停止新增改动。
收束:整理已通过项、剩余失败项、失败命令输出和下一步建议。Misunderstanding “autonomy”
More autonomy does not mean less control. From an engineering perspective, giving agents autonomy requires explicit goal, state, verification, and budget definitions—similar to turning a one‑off script into a production service with health checks, retries, and rollbacks.
Relation to Skills and future workflow
The article links Goal to earlier concepts like Skills, AGENTS.md, Hooks, Memory, and Subagents. All these components answer the same question: which parts of a conversation should become reusable, auditable engineering assets? The author sketches a future Harness workflow where Agents first learn repository rules ( AGENTS.md), then follow a Skill, attach a /goal, run deterministic Hooks, record progress, and finally undergo CI validation.
Conclusion
The convergence on /goal signals that long‑task agents need a lifecycle: explicit scope, immutable constraints, verifiable evidence, bounded budget, and a clear closure. The differences among Codex, Claude Code, and Hermes illustrate three engineering trade‑offs—state persistence, independent evaluation, and cross‑session continuity—rather than a simple feature race.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
