Build a Minimal AI Agent Loop in 30 Minutes and Turn It into a Stable Production System

This article walks through constructing a tiny, runnable AI agent loop that reads a user task, lets the model choose the next step, calls a tool, feeds the observation back, and repeats, then explains how to add harness, memory, permission, and validation layers to make the agent reliable in real‑world engineering environments.

AI Architecture Hub
AI Architecture Hub
AI Architecture Hub
Build a Minimal AI Agent Loop in 30 Minutes and Turn It into a Stable Production System

Why a Minimal Loop?

Many developers can describe high‑level concepts like Harness or Function Calling, but when asked "How does an agent actually run?" they struggle to give a concise answer. The simplest answer is a four‑step loop: read the user task, let the model pick the next action, invoke the required tool, and feed the tool’s result back into the model.

Core Minimal Coding Agent

The goal is not to recreate a full‑featured product like Claude Code in 30 minutes, but to isolate the essential logic that makes an agent work. The minimal agent needs only three inputs:

User task – e.g., "list files in the current directory" or "read a specific code file".

Conversation history – the accumulated messages, model responses, and tool observations that form the context.

Tool list – a small, controlled set of functions the model may call.

Three basic tools are provided:

const tools = [listFiles, readFile, runCommand];

Each tool is a simple function; for example:

async function listFiles(path) { /* list directory */ }

These tools are deliberately limited to read‑only or safe operations to avoid security risks in the first version.

Agent Loop Implementation (≈20 lines)

async function runAgent(task) {
  const messages = [{ role: "user", content: task }];
  for (let step = 0; step < 8; step++) {
    const response = await model.create({ messages, tools });
    messages.push({ role: "assistant", content: response.content });
    if (!response.toolCall) return response.text;
    const observation = await runTool(response.toolCall);
    messages.push({ role: "tool", content: observation });
  }
  return "Stopped: step limit reached.";
}

The loop follows five explicit steps:

Initialize context by inserting the user task.

Model decision: call the LLM with current messages and the tool list.

Record the model’s response (either a final answer or a tool‑call request).

Tool execution: if a tool call is present, run the corresponding function; otherwise finish.

Append the observation to the message list and repeat until the task is done or the step limit is hit.

Why Function Calling Is Not Enough

Function Calling only solves the "how to express a tool call" problem. Engineers must still handle parameter validation, permission checks, output trimming, error recovery, and safety constraints. The article lists concrete failure modes such as path traversal, dangerous commands, oversized tool output, and infinite loops.

Harness – The Runtime Control Layer

In the minimal loop a simple step < 8 guard prevents runaway execution. A production‑grade Harness adds nine controls:

Maximum loop iterations.

Maximum tool‑call count.

Per‑tool timeout.

Token and cost budgeting.

Tool‑output trimming.

Error classification and recovery strategies.

Permission confirmation for high‑risk actions.

Logging and replay for audit.

Explicit task‑completion criteria.

Pi’s implementation shows how beforeToolCall and afterToolCall hooks give the engineering side control over parameters, permissions, and output post‑processing.

Memory Design – Three‑Layer Approach

The naive implementation stores every message in a messages array, which quickly leads to token bloat, stale context, and lost state after a restart. A more robust design splits memory into:

Current context – transient information needed for the ongoing loop, kept in the model’s context window.

Persistent facts – project conventions, user preferences, stored in markdown files or a database.

Procedural experience – reusable patterns, skills, or playbooks, stored in a skill library.

OpenClaw and Hermes follow this layering, using files like memory/YYYY‑MM‑DD.md for daily logs and a separate skill store for reusable procedures.

Permission Levels

Tools are categorized into four risk tiers with default policies:

Read‑only – e.g., list_files, read_file; allowed without confirmation.

Safe execution – e.g., test commands; allowed via a whitelist.

Write operations – e.g., write_file; require manual approval.

High‑risk actions – e.g., delete files, network access; disabled by default and need explicit enablement.

This tiered model mirrors Claude Code’s permission system and Anthropic’s risk‑based classifier, which still shows a 17 % false‑negative rate on a real‑world risky‑action dataset.

Task Verification

Simply returning "task completed" is insufficient. A reliable agent must answer six questions:

Which files were read?

Which tools were invoked and what were their results?

What code changes were made and why?

Did the relevant tests pass?

Which static checks succeeded or failed?

What remaining risks need handling?

For a coding agent, a verification pipeline includes test execution, type‑checking, linting, diff‑to‑task matching, error summarization, and human confirmation for high‑risk modifications.

From Minimal Loop to Production‑Ready Agent

When you compare Claude Code, OpenClaw, and Pi, the core loop is identical; the difference lies in the surrounding boundaries—tool, context, memory, permission, and validation layers. Adding these systematically turns a demo that "runs" into a system that "runs reliably" in daily engineering workflows.

Takeaway

The most valuable insight is that the future competitive edge will be in the Harness layer, not the raw LLM. Engineers who can design precise tool schemas, robust permission checks, efficient memory handling, and thorough validation will build agents that truly assist developers at scale.

memory managementAI AgentTool CallingPermission Controlagent loopHarness
AI Architecture Hub
Written by

AI Architecture Hub

Focused on sharing high-quality AI content and practical implementation, helping people learn with fewer missteps and become stronger through AI.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.