Agent, Multi‑Agent, Deep Agent: Start Simple, Add Complexity Only When Needed
The article clarifies the distinct meanings of Agent, Multi‑Agent, and Deep Agent, explains how control shifts from engineers to models, compares architectures across nine dimensions, and shows why a lightweight harness is essential for long‑running, parallel AI‑driven software development.
Agent, Multi‑Agent, Deep Agent – definitions and boundaries
Agent: model‑centric control. Workflow vs Agent: workflow has fixed execution path written by engineer; Agent decides path and tools at runtime. A single agent consists of five closed‑loop components – perception, memory, reasoning, action, feedback – all operating within one context window, which limits task size and forces serial processing.
Multi‑Agent
Requires true parallel execution and explicit role division. Three common topologies:
Orchestrator‑Workers : a central orchestrator splits tasks, dispatches workers, aggregates results; common in production for controllability.
Agent Teams : agents share a codebase and coordinate via the environment (e.g., Git) without a central orchestrator; simpler but lacks global goal alignment.
Pipeline : agents pass intermediate artifacts along a DAG; the Evaluator‑Optimizer pattern iterates until quality criteria are met.
Deep Agent
Deep Agent is a ready‑to‑use harness that abstracts prompt engineering, tool integration, and context management, enabling agents to continue work across multiple sessions. The key ideas are:
Cognitive continuity : each session produces structured hand‑off artifacts (files, Git commits, progress logs) that the next session reads.
State externalization : project state lives in the file system, not in the model’s memory.
Incremental enforcement : the harness forces agents to complete one task at a time before committing.
Harness as core : the engineering infrastructure around the model, not the model itself, enables cross‑session work.
Reference implementation: https://github.com/langchain-ai/deepagents
Horizontal comparison (nine dimensions)
Execution model : Agent – single‑threaded serial; Multi‑Agent – multi‑threaded parallel; Deep Agent – parallel + cross‑session hand‑off.
Task time span : Agent – minutes, single session; Multi‑Agent – minutes to hours; Deep Agent – hours to days, multi‑session.
State location : Agent – context window; Multi‑Agent – shared memory/state; Deep Agent – file system + Git + structured artifacts.
Coordination : Agent – none; Multi‑Agent – message passing or orchestrator scheduling; Deep Agent – file locks, Git merges, progress files.
Harness role : Agent – basic tool integration; Multi‑Agent – task distribution and aggregation; Deep Agent – full environment management infrastructure.
Failure points : Agent – single‑point inference failure; Multi‑Agent – agent conflicts, inconsistent results; Deep Agent – session memory loss, premature completion.
Observability difficulty : Agent – low; Multi‑Agent – medium; Deep Agent – high (cross‑session chain with versioned artifacts).
Result verification : Agent – human review / unit tests; Multi‑Agent – cross‑validation among agents; Deep Agent – end‑to‑end testing + browser automation.
Real‑world use cases : Agent – RAG, code completion, QA; Multi‑Agent – code review, research report generation; Deep Agent – full‑stack web apps, C compiler construction.
When to adopt more complex architectures
Upgrade to Multi‑Agent or Deep Agent only when genuine bottlenecks appear: exhausted context windows, serial slowdown, or tasks that exceed a single session. Otherwise a well‑prompted single agent is sufficient.
Why a harness is mandatory
Limited context windows make it impossible to finish complex projects in one go; agents need a bridge across sessions.
The harness records state at session end, restores it at start, controls tool permissions, manages test loops, and enforces a clean codebase. Its five core duties are:
Context injection : package progress logs, feature list, Git history, and init scripts into the agent’s initial context.
Incremental progress tracking : force each session to write a progress file and Git commit, decoupling state from the model.
Task granularity control : prompt constraints ensure the agent handles only one feature per iteration; Anthropic experiments show unconstrained agents try to finish everything at once and run out of context.
Verification loop : attach end‑to‑end test tools (e.g., Puppeteer) and require a full user‑flow pass before marking a feature complete.
Clean state maintenance : after each session the repository must be merge‑ready, with no known bugs or missing documentation.
Claude Agent SDK includes a context‑compression mechanism, but Anthropic found compression insufficient because it loses engineering context such as which features are done or which bugs are fixed.
Coordinating multiple agents without conflict (C‑compiler experiment)
Sixteen agents run in independent Docker containers sharing a bare Git repository mounted at /upstream. Each container clones to /workspace, works, then pushes back. Git is the sole shared state.
Task locking with a single file
Write‑to‑lock : an agent creates a file named after the task in current_tasks/ (e.g., parse_if_statement.txt); Git sync forces a second agent to back off.
Complete‑merge‑unlock : after finishing, the agent pulls others’ changes, merges, pushes, and deletes the lock file. Merge conflicts are common but Claude can resolve them.
New session continuation : Claude Code ends a session and spawns a fresh container for the next session, continuing work.
This lock‑file approach prevents duplicate effort but leaves high‑level goal coordination absent, an open problem noted by Carlini.
Role specialization
Primary development agent : implements features, fixes bugs, runs tests.
Code integration agent : deduplicates repeated implementations.
Performance optimization agent : focuses solely on compiler speed.
Code‑quality agent : performs structural refactoring and documentation.
Delta‑debugging style splitting of a monolithic task
Use GCC as an oracle: compile most kernel files with GCC, the rest with Claude’s compiler. If the kernel runs, the problem isn’t in Claude’s files; if it crashes, narrow the problematic subset. This turns an indivisible task into many verifiable sub‑problems.
Testing for agents
Four principles guide test output:
Do not dump garbage into context : emit a short summary; write detailed logs to files. Use a marker like ERROR for quick grep.
Control time blindness : add a --fast flag to sample a small percentage of test cases, balancing coverage and runtime.
Pre‑compute statistics : harness calculates pass rates, new failures, regressions and outputs them, saving token budget.
End‑to‑end verification : require full user‑flow execution via browser automation (e.g., Puppeteer) before a feature is marked complete.
Agents will strive to satisfy tests but cannot exceed what tests can verify; thus test design defines the quality ceiling.
System pitfalls for long‑running agents
One‑shot overreach : models try to do as much as possible in the current window. Harness splits requirements; coding agent is forced to handle one feature per iteration.
Premature completion claim : lack of objective global completion criteria. Feature list entries start as passes: false and flip to true only after passing tests.
Environment degradation : agents prioritize new features over code health. Session end forces Git commit and a smoke test ( init.sh) before the next session starts.
False‑positive feature pass : agents judge success from code logic, not runtime. Require browser‑automated end‑to‑end tests before marking completion.
Parallel duplicate work : no task allocation or granularity too coarse. File‑lock protocol ensures exclusive task ownership; for monolithic tasks, use oracle‑based splitting.
Key takeaways
Externalize state to files rather than relying on context compression.
Each harness component encodes an assumption that may become obsolete as models improve.
Borrow proven software‑engineering practices—Git commits, incremental progress, smoke tests—for autonomous agents.
Tests define the upper bound of what an autonomous system can reliably achieve.
Maintain a minimal footprint: ensure each session leaves the repository in a merge‑ready state, enabling easy rollback with git revert.
References
Schluntz, E. & Zhang, B. Building Effective Agents. Anthropic Research, Dec 2024. https://anthropic.com/research/building-effective-agents
Young, J. Effective Harnesses for Long‑Running Agents. Anthropic Engineering, Nov 2025. https://anthropic.com/engineering/effective-harnesses-for-long-running-agents
Carlini, N. Building a C Compiler with a Team of Parallel Claudes. Anthropic Engineering, Feb 2026. https://anthropic.com/engineering/building-c-compiler
Claude's C Compiler – GitHub Repository. https://github.com/anthropics/claudes-c-compiler
LangChain Blog. Deep Agents. https://blog.langchain.com/deep-agents/
LangChain Blog. Context Management for Deep Agents. https://blog.langchain.com/context-management-for-deepagents/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Engineer Programming
In the AI era, defining problems is often more important than solving them; here we explore AI's contradictions, boundaries, and possibilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
