Designing Effective Coding Agents: Six Core Components Explained
This article analyzes the architecture of coding agents and their harnesses, detailing six essential components, how they interact with real‑time repository context, prompt caching, tool validation, context‑bloat control, structured memory, and delegation, while providing concrete Python examples and visual diagrams.
Core Components of a Coding Agent
A coding agent consists of six tightly coupled parts:
LLM : the raw next‑token language model.
Inference model : an LLM that has been further trained or prompt‑optimized to emit richer intermediate reasoning traces and perform self‑verification.
Agent : a control loop that repeatedly calls the model, selects tools, updates state, and decides when to stop.
Agent Harness : the software scaffolding that assembles prompts, exposes a fixed set of tools, tracks file state, applies edits, runs commands, manages permissions, caches stable prompt prefixes, and stores memory.
Coding Harness : a specialization of the Agent Harness for software‑engineering tasks, handling code context, tool execution, and iterative feedback.
Delegation : the ability to spawn constrained sub‑agents for side‑tasks such as symbol lookup or test inspection.
1) Real‑time Repository Context
Before any action, the agent gathers a concise "workspace summary" that captures stable facts about the project:
Whether the current directory is a Git repository.
Active branch name.
Presence of documentation files (e.g., AGENTS.md, README) that indicate which test command to run.
Directory structure and key source files.
For a request like "fix the failing tests", the agent first reads AGENTS.md or README to discover the appropriate test command, then locates the relevant test files using the repository layout. This pre‑flight step avoids guessing and ensures that subsequent prompts contain the necessary context without rebuilding it from scratch each turn.
2) Prompt Structure and Cache Reuse
After the workspace view is obtained, the harness constructs a prompt in two layers:
Stable prompt prefix : contains invariant instructions, tool descriptions, and the cached workspace summary. This prefix changes only when the repository layout or tool set changes.
Dynamic session state : short‑term memory, the latest user request, and the most recent transcription.
Rebuilding the entire prompt for every turn wastes compute. By caching the stable prefix and only appending the dynamic state, the system reduces token usage while preserving all necessary information.
3) Tool Access, Validation, and Permissions
The agent can only invoke a predefined set of tools. Each tool call must be emitted in a structured action with typed parameters that the harness can validate.
The harness performs four checks before execution:
Is the tool known?
Are the parameters syntactically and semantically valid?
Is user approval required for this action?
Is the requested file path inside the workspace?
Only when all checks pass does the command run, reducing risk while preserving flexibility.
Mini Coding Agent shows a concrete approval request (Figure 9). The model selects an action such as list_files, read_file, or run_shell, supplies parameters, and the harness pauses for validation.
4) Controlling Context Bloat
Long conversations quickly consume the token budget because agents repeatedly read files, produce large tool outputs, and log information. A robust coding harness applies two complementary compression strategies:
Clipping : truncate overly long fragments (e.g., huge file diffs, verbose tool output) so that no single piece dominates the prompt.
Transcript summarization : compress older history into a concise summary while keeping recent events richer. The system also deduplicates repeated file reads.
Figure 10 illustrates clipping of large outputs and summarization of early transcript entries.
5) Structured Session Memory
The harness maintains two layers of memory:
Working memory : an explicit, compact state that the agent updates each round (current task, important files, recent notes).
Full transcript : a complete JSON log of every user request, tool output, and model response, stored on disk for full recovery.
When a new event occurs, it is appended to the full transcript and simultaneously summarized into the working memory. The working memory is used for prompt reconstruction, while the full transcript enables session restoration after a shutdown.
6) Delegation and Constrained Sub‑Agents
When the main agent encounters a side‑task (e.g., locating a symbol definition or diagnosing a failing test), it can spawn a sub‑agent that inherits enough context to solve the sub‑problem but runs under stricter constraints:
Read‑only file access.
Limited recursion depth.
Sandboxed permissions.
Claude Code introduced sub‑agents early and enforces a read‑only mode by default. Codex added sub‑agents later; it typically inherits most sandbox settings and expresses constraints through task scope and recursion limits rather than an explicit read‑only flag.
Implementation Sketch
The Mini Coding Agent repository provides a minimal Python implementation that demonstrates all six components. The core file contains comments marking each part:
# 1) Real‑time repository context -> WorkspaceContext
# 2) Prompt structure & cache reuse -> build_prefix, memory_text, prompt
# 3) Structured tools, validation & permission -> build_tools, run_tool, validate_tool, approve, parse, path, tool_*
# 4) Context clipping & history compression -> clip, history_text
# 5) Session transcript & memory -> SessionStore, record, note_tool, ask, reset
# 6) Delegation & bounded sub‑agent -> tool_delegateRepository URL (plain text): https://github.com/rasbt/mini-coding-agent
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
