Artificial Intelligence 21 min read

Designing Effective Coding Agents: Six Core Components Explained

This article analyzes the architecture of coding agents and their harnesses, detailing six essential components, how they interact with real‑time repository context, prompt caching, tool validation, context‑bloat control, structured memory, and delegation, while providing concrete Python examples and visual diagrams.

Wuming AI

Apr 6, 2026

Designing Effective Coding Agents: Six Core Components Explained

Core Components of a Coding Agent

A coding agent consists of six tightly coupled parts:

LLM : the raw next‑token language model.

Inference model : an LLM that has been further trained or prompt‑optimized to emit richer intermediate reasoning traces and perform self‑verification.

Agent : a control loop that repeatedly calls the model, selects tools, updates state, and decides when to stop.

Agent Harness : the software scaffolding that assembles prompts, exposes a fixed set of tools, tracks file state, applies edits, runs commands, manages permissions, caches stable prompt prefixes, and stores memory.

Coding Harness : a specialization of the Agent Harness for software‑engineering tasks, handling code context, tool execution, and iterative feedback.

Delegation : the ability to spawn constrained sub‑agents for side‑tasks such as symbol lookup or test inspection.

1) Real‑time Repository Context

Before any action, the agent gathers a concise "workspace summary" that captures stable facts about the project:

Whether the current directory is a Git repository.

Active branch name.

Presence of documentation files (e.g., AGENTS.md, README) that indicate which test command to run.

Directory structure and key source files.

For a request like "fix the failing tests", the agent first reads AGENTS.md or README to discover the appropriate test command, then locates the relevant test files using the repository layout. This pre‑flight step avoids guessing and ensures that subsequent prompts contain the necessary context without rebuilding it from scratch each turn.

Agent Harness building a short workspace summary

2) Prompt Structure and Cache Reuse

After the workspace view is obtained, the harness constructs a prompt in two layers:

Stable prompt prefix : contains invariant instructions, tool descriptions, and the cached workspace summary. This prefix changes only when the repository layout or tool set changes.

Dynamic session state : short‑term memory, the latest user request, and the most recent transcription.

Rebuilding the entire prompt for every turn wastes compute. By caching the stable prefix and only appending the dynamic state, the system reduces token usage while preserving all necessary information.

Agent Harness constructing a stable prompt prefix and adding dynamic session state

3) Tool Access, Validation, and Permissions

The agent can only invoke a predefined set of tools. Each tool call must be emitted in a structured action with typed parameters that the harness can validate.

The harness performs four checks before execution:

Is the tool known?

Are the parameters syntactically and semantically valid?

Is user approval required for this action?

Is the requested file path inside the workspace?

Only when all checks pass does the command run, reducing risk while preserving flexibility.

Model emits a structured action, harness validates, optionally requests approval, executes, and feeds back the result

Mini Coding Agent shows a concrete approval request (Figure 9). The model selects an action such as list_files, read_file, or run_shell, supplies parameters, and the harness pauses for validation.

Mini Coding Agent tool‑call approval request

4) Controlling Context Bloat

Long conversations quickly consume the token budget because agents repeatedly read files, produce large tool outputs, and log information. A robust coding harness applies two complementary compression strategies:

Clipping : truncate overly long fragments (e.g., huge file diffs, verbose tool output) so that no single piece dominates the prompt.

Transcript summarization : compress older history into a concise summary while keeping recent events richer. The system also deduplicates repeated file reads.

Figure 10 illustrates clipping of large outputs and summarization of early transcript entries.

Clipping and transcript compression workflow

5) Structured Session Memory

The harness maintains two layers of memory:

Working memory : an explicit, compact state that the agent updates each round (current task, important files, recent notes).

Full transcript : a complete JSON log of every user request, tool output, and model response, stored on disk for full recovery.

When a new event occurs, it is appended to the full transcript and simultaneously summarized into the working memory. The working memory is used for prompt reconstruction, while the full transcript enables session restoration after a shutdown.

New events appended to full transcript and summarized into working memory

6) Delegation and Constrained Sub‑Agents

When the main agent encounters a side‑task (e.g., locating a symbol definition or diagnosing a failing test), it can spawn a sub‑agent that inherits enough context to solve the sub‑problem but runs under stricter constraints:

Read‑only file access.

Limited recursion depth.

Sandboxed permissions.

Claude Code introduced sub‑agents early and enforces a read‑only mode by default. Codex added sub‑agents later; it typically inherits most sandbox settings and expresses constraints through task scope and recursion limits rather than an explicit read‑only flag.

Sub‑agent inherits enough context but runs within tighter boundaries

Implementation Sketch

The Mini Coding Agent repository provides a minimal Python implementation that demonstrates all six components. The core file contains comments marking each part:

# 1) Real‑time repository context -> WorkspaceContext
# 2) Prompt structure & cache reuse -> build_prefix, memory_text, prompt
# 3) Structured tools, validation & permission -> build_tools, run_tool, validate_tool, approve, parse, path, tool_*
# 4) Context clipping & history compression -> clip, history_text
# 5) Session transcript & memory -> SessionStore, record, note_tool, ask, reset
# 6) Delegation & bounded sub‑agent -> tool_delegate

Repository URL (plain text): https://github.com/rasbt/mini-coding-agent

LLM prompt engineering Tool Integration Software Engineering context management coding agents Agent Harness