Why Codex, Claude Code, and Hermes All Adopt /goal: Turning Prompt Goals into Runtime Agent Interfaces

From late April to mid‑May, OpenAI Codex, Claude Code, and Hermes each introduced an explicit /goal capability that transforms a one‑sentence prompt into a managed runtime object, enabling long‑running agents to maintain state, validation, budget, and pause/resume control within the Agent Harness.

Architect
Architect
Architect
Why Codex, Claude Code, and Hermes All Adopt /goal: Turning Prompt Goals into Runtime Agent Interfaces

In a short half‑month window (late April to mid‑May), three AI coding tools—OpenAI Codex, Anthropic Claude Code, and Hermes—simultaneously introduced an explicit /goal command. Although each implementation differs, all aim to solve the same problem: long‑running agents cannot stop after a single turn and need a persistent, auditable, and controllable target.

Why the convergence on /goal ?

The need arises because a goal expressed only as natural language in a prompt lacks boundaries, state, verification, and the ability to be paused, resumed, or audited. As the author puts it, the goal evolves from a single prompt sentence into a runtime interface of the Agent Harness.

Background and prior work

The article references earlier posts that examined long‑task agents, the Ralph Loop, and the evolution of Skills into repository‑level work‑flow packages. Those works highlighted the importance of persistent artifacts such as GOAL.md, PROGRESS.md, Git history, test reports, and failure logs.

Public discussions that foreshadowed the change

Geoffrey Huntley described a self‑healing Ralph Loop that automatically fixes security failures and re‑validates.

Trey Goff advocated using the Ralph Loop on Codex hooks to prevent early exits.

Greg Brockman framed Codex CLI 0.128.0’s /goal as a “built‑in Ralph loop++”.

Tobi Lütke promoted “context engineering” – providing enough context (including goal, state, budget, and verification) for the LLM to solve the task.

Karpathy’s autoresearch project demonstrated a short‑duration, self‑experiment loop with measurable, roll‑backable outcomes.

All these discussions converge on the same engineering question: when an agent works for many turns, the system must clearly define goal, state, feedback, and budget.

Three implementations compared

Codex treats the goal as a long‑task control console. It persists the goal in a thread‑level state database and provides user commands such as /goal, /goal pause, /goal resume, and /goal clear. The model drives the task forward while the runtime tracks activation, pause, completion, token usage, time limits, and prevents runaway loops.

Claude Code implements a session‑scoped stop‑hook. After each turn, an independent evaluator examines the dialogue evidence and decides whether the goal is satisfied. If not, the task continues; if yes, execution stops. This design is lightweight but relies on the main model to embed sufficient verification evidence in the conversation.

Hermes stores the goal in SessionDB.state_meta, making it accessible across sessions and gateways (CLI, Telegram, Discord, etc.). It focuses on cross‑session continuity and fail‑open behavior: on errors the goal_judge fails open, while a turn‑budget and explicit reason enforce a hard stop.

All three share the same high‑level intent—turn a fleeting prompt into a managed runtime object—but differ in where they extract state, how they evaluate completion, and how they handle cross‑session continuity.

What a robust Goal must contain

The author lists five mandatory fields for a usable Goal:

Scope : precise directory, module, or user path.

Invariants : public APIs or data structures that must not change.

Verification evidence : concrete commands, logs, diffs, CI status that prove success.

Budget boundaries : maximum turns, time, or token consumption.

Closure output : summary of completed items, remaining failures, failed commands, assumptions, and next‑step suggestions.

A concrete example is provided:

/goal 修复 test/auth 下所有失败测试,并保持 src/auth 的 public API 不变。
范围:只改 src/auth、test/auth 和必要的测试辅助文件。
不变量:不删除现有断言,不改公共 API,不扩大到 unrelated tests。
验证:每轮修改后运行 npm test -- test/auth;完成前运行 lint,并报告 git diff 范围。
预算:20 turns 后仍未完成则停止新增改动。
收束:整理已通过项、剩余失败项、失败命令输出和下一步建议。

Misunderstanding “autonomy”

More autonomy does not mean less control. From an engineering perspective, giving agents autonomy requires explicit goal, state, verification, and budget definitions—similar to turning a one‑off script into a production service with health checks, retries, and rollbacks.

Relation to Skills and future workflow

The article links Goal to earlier concepts like Skills, AGENTS.md, Hooks, Memory, and Subagents. All these components answer the same question: which parts of a conversation should become reusable, auditable engineering assets? The author sketches a future Harness workflow where Agents first learn repository rules ( AGENTS.md), then follow a Skill, attach a /goal, run deterministic Hooks, record progress, and finally undergo CI validation.

Three /goal architectures
Three /goal architectures

Conclusion

The convergence on /goal signals that long‑task agents need a lifecycle: explicit scope, immutable constraints, verifiable evidence, bounded budget, and a clear closure. The differences among Codex, Claude Code, and Hermes illustrate three engineering trade‑offs—state persistence, independent evaluation, and cross‑session continuity—rather than a simple feature race.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI AgentsHermesOpenAI CodexClaude Codelong-running tasksAgent Harnessgoal interface
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.