Agent Harness: From Instruction Computer to Semantic Computer Runtime
The article proposes a semantic‑computer runtime for Agent Harness that mirrors traditional CPU, register, stack, heap, and code‑area structures, enabling large language models to generate and execute incremental semantic instructions within limited context windows.
Abstract: This paper introduces a framework that treats Agent Harness as a “semantic computer runtime” to address the challenge of large language models (LLMs) continuously executing open‑ended goals within a limited context window. By analogizing the classic instruction‑computer runtime (CPU, registers, stack, heap, code area), it builds a semantic runtime composed of LLM, Semantic Register, Semantic Stack, Semantic Heap, and SkillRegistry, and defines a complete semantic instruction loop (Generate & Execute).
The core problem of Agent Harness is not merely attaching more tools to an LLM, but enabling the model to keep progressing toward an open goal when its context capacity is bounded.
Traditional instruction computers follow a Fetch → Execute → Update State cycle, where the CPU fetches pre‑written binary instructions and executes them using a small register set, memory, stack, heap, and code area. This architecture solves the problem of a limited processing unit executing complex programs.
LLMs face a similar but distinct issue: they have strong capabilities but cannot read unlimited information. The context window serves as both working memory and bottleneck. Agent histories, tool results, intermediate reasoning, external documents, and tool specifications can quickly exceed this window, leading to cost spikes, attention dilution, reasoning degradation, and loss of control if everything is forced into the prompt.
Consequently, Agent Harness should be viewed as a “semantic computer” runtime whose purpose is to let the LLM continuously generate the next semantic instruction within the limited context, rather than executing a pre‑written instruction.
The semantic computer differs from the instruction computer in that the next action does not already exist; it must be generated on the fly by the LLM based on the current context.
Five core concepts of the semantic computer are defined:
LLM as CPU : the execution core that generates the next semantic instruction from the visible context.
Semantic Register : the minimal context passed to the LLM each round, focusing attention on the information needed for the next instruction.
Semantic Stack : manages goals and sub‑goals as frames, analogous to function call/return, without storing large context payloads.
Semantic Heap : stores runtime semantic objects such as tool results, observations, drafts, hypotheses, errors, and decisions; it is not a long‑term memory but a workspace for the current execution.
SkillRegistry : holds callable capabilities (search, browser, query_db, run_code, send_email, etc.) as code, separate from heap objects.
Because the Semantic Heap grows while the Semantic Register remains tiny, a progressive loading mechanism is required. Each round first loads a small heap summary relevant to the current frame, then loads candidate skill descriptions, and finally lets the LLM decide whether more loading is needed or whether it can generate an execution instruction.
The semantic instruction set is compressed into four categories:
LOAD : loads content into the Semantic Register from either the heap (semantic objects) or the skill registry (ability descriptions). Supports progressive disclosure (pointer → summary → content, etc.).
CALL : invokes a previously loaded and validated skill; the result is automatically committed to the Semantic Heap.
FRAME : pushes a sub‑goal or pops the current goal on the Semantic Stack, mirroring function call/return.
RESPOND : outputs the final result.
The execution loop can be expressed as the following pseudo‑code, which illustrates the Generate & Execute paradigm:
while not stack.done():
frame = stack.current()
register.reset()
register.pin(frame.brief())
# Load current problem context from heap
register.pin(
heap.load(source="heap", mode="seed", frame=frame, budget=heap_seed_budget)
)
# Load candidate abilities based on the context
register.pin(
skills.load(source="skill", mode="seed", frame=frame, heap_context=register.heap_view(), budget=skill_seed_budget)
)
while True:
instr = LLM.generate_instruction(context=register.view(), isa=["LOAD", "CALL", "FRAME", "RESPOND"])
if instr.type == "LOAD":
if instr.source == "heap":
fragment = heap.load(query=instr.query, ptr=instr.ptr, level=instr.level, budget=instr.budget)
register.pin(fragment)
elif instr.source == "skill":
fragment = skills.load(query=instr.query, ptr=instr.ptr, level=instr.level, budget=instr.budget)
register.pin(fragment)
continue
elif instr.type == "CALL":
validator.check(instr, register)
result = skills.invoke(skill=instr.skill, args=instr.args)
heap.commit(frame=frame, instr=instr, result=result)
break
elif instr.type == "FRAME":
if instr.op == "push":
stack.push(instr.goal)
elif instr.op == "pop":
stack.pop()
heap.commit(frame=frame, instr=instr, result="frame_updated")
break
elif instr.type == "RESPOND":
return instr.contentThis loop embodies the key idea: instead of preparing all context up front, the system incrementally loads information via LOAD instructions while the LLM decides when enough data is available to issue CALL, FRAME, or RESPOND instructions that actually advance the task.
The paper clarifies the boundary of Agent Harness: the heap and SkillRegistry are passive stores; the LLM’s main loop makes all judgments about loading, invoking, framing, or responding. Thus Agent Harness is not a simple tool‑orchestration framework nor a traditional long‑memory system; it is a semantic computer runtime that uses a Semantic Register to cope with limited context, a Semantic Stack to manage goal calls, a Semantic Heap to hold runtime objects, and a SkillRegistry to expose capabilities, while the LLM dynamically generates the next semantic instruction.
In contrast to instruction computers that execute frozen steps, the semantic computer freezes the capability boundary and lets the LLM generate execution logic at runtime, providing a stable, controllable, and extensible structure for the Generate & Execute paradigm.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
