Artificial Intelligence 26 min read

What Skills Architects Must Master in the Agent Era and Which Will Last Six Months

In the fast‑changing Agent era, architects should focus on durable engineering capabilities—context management, tool design, evaluation, harness, permissions, and cost control—rather than chasing the latest frameworks, ensuring agents remain stable and controllable in production systems.

Architect

May 4, 2026

Filtering new Agent technology

Because models, SDKs and APIs change rapidly, evaluate each new component with five questions: (1) Will it still be relevant in six months? (2) Can it be integrated without breaking existing logging, permission, retry or deployment pipelines? (3) Does it address a real production failure mode or only a demo novelty? (4) Can its impact be traced and measured? (5) What is the cost of missing it?

Context as a runtime work set

Treat the Agent’s context like a mutable work set rather than a static chat log. Each round ask: what information is needed now, what can be summarized, and what belongs in external storage. The author splits context into five layers:

Model window – current goal, constraints, recent observations needed for the next decision.

Session state – plan, completed actions, pending tasks, error fixes, user intent.

File/DB layer – large objects, logs, code, historical documents, replayable facts.

Project specs – AGENTS.md, README, runbooks, team conventions.

Tool layer – retrieval, pagination, validation, write actions, permission‑controlled operations.

Keeping the model window small and pushing stable data outward reduces context pollution that often causes failures such as stale errors, overwritten facts, or loss of constraints after compression.

Tools as business interfaces

Tools are the only way an Agent interacts with the outside world, so they should be designed like well‑defined APIs. A practical rule is to expose five to ten clearly named tools; more overlapping tools increase the chance of misuse. Each tool description should specify:

When the tool should be used and when it should not.

Exact parameter types and boundaries.

What parts of the response are essential results.

How to handle failures (actionable error messages).

Whether the operation is dangerous and requires permission checks.

Support for pagination or streaming large results.

The author writes tool contracts as tiny interface specifications, avoiding overly broad or overly narrow definitions that cause the model either to misuse the tool or to avoid it entirely.

The emerging MCP protocol separates tool, resource and capability boundaries, making tools easier to govern.

Evaluation before launch

Agent outputs can look correct while hiding hidden failures (e.g., code that later crashes, reports that cannot be audited). Build a small internal evaluation set (≈50 real‑trace samples) and run it after every change to prompts, models, tools or context‑compression logic. This trace‑based eval works like unit tests: it does not guarantee perpetual correctness but flags regressions immediately.

Harness: the execution layer

Harness sits between the model and the production system. For short tasks it acts as an executor; for long tasks it must provide the same state, queue, logging, permission, recovery and audit capabilities as a traditional backend. The division of labor is:

Model selects the next step.

Harness validates the step, executes it, captures output, decides feedback, creates checkpoints, and schedules sub‑tasks if needed.

When a model upgrade improves capabilities, old prompt fragments, tool wrappers or rules may become liabilities and should be pruned.

State management

Since LLMs do not retain state, external state layers are required for long‑running agents. The author groups state into:

Current inference state – the model window.

Task progress – session, plan, checkpoint files.

Replayable facts – files, databases, trace logs.

Team experience – AGENTS.md, skills, runbooks, checklists.

User preferences – small memory or profile records.

Persisting these layers enables diffing, rollback, audit and reproducibility.

Task boundaries and sub‑agents

Before adding multiple agents, isolate the main context. Sub‑agents are useful for isolated investigations (search, read, verify) that return only conclusions and evidence, keeping the primary window clean. Start with a single agent; add orchestrated sub‑agents only when context pressure, tool latency or task heterogeneity demand it.

Permissions and sandboxing

When agents can read files, write code, run commands or query databases, security must be built in from the start. Basic safeguards include:

File‑access scopes.

Command whitelists or approval flows.

Network egress controls.

Key‑scope restrictions.

Tool‑level permission tiers.

Double‑confirmation for dangerous actions.

Sandbox isolation.

Trace & audit logging.

Rollback paths.

Delaying these controls leads to costly retrofits.

Getting started: a small closed loop

For teams beginning with agents, avoid building an “all‑purpose platform.” Choose a narrow, high‑impact task with measurable acceptance criteria (e.g., automation rate, manual review time, rework rate, error types, rollback procedure). Implement the following eight‑step loop:

Define a concrete goal.

Implement a single‑agent main loop.

Create 3‑7 well‑defined tools.

Externalize state outside the model window.

Provide a sandbox environment.

Add tracing for every tool call.

Collect ~50 initial evaluation samples from real traces.

Establish a rollback‑capable release process.

Early failures will reveal which capability to improve next (e.g., context pagination, tool description, evaluation samples, state externalization, permission tightening, cost monitoring).

Deferrable items

Frameworks that merely re‑package existing models with heavy migration cost.

Complex long‑term memory systems that are not yet needed for the identified failure modes.

Large multi‑agent role hierarchies; add them only after the core loop is stable.

Public benchmark suites; they are useful for reference but cannot replace internal trace‑based evaluation.

Frequent model‑by‑model re‑evaluation; instead adopt a quarterly reassessment cadence.

Conclusion

In the Agent era, the durable engineering effort lies in building a reliable execution environment: layered context, well‑designed tool contracts, persistent state, permission controls, systematic trace‑based evaluation, and a maintainable harness that can be trimmed or extended as models evolve. Architecture effort should focus on these system capabilities rather than chasing every new framework or terminology.

References

Rohit: “What to Learn, Build, and Skip in AI Agents (2026)” – https://x.com/rohit4verse/status/2049548305408131349

Anthropic: “Effective context engineering for AI agents” – https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

Anthropic: “Writing effective tools for agents” – https://www.anthropic.com/engineering/writing-tools-for-agents

Anthropic: “How we built our multi‑agent research system” – https://www.anthropic.com/engineering/multi-agent-research-system

Cognition: “Don’t Build Multi‑Agents” – https://cognition.ai/blog/dont-build-multi-agents

Cursor: “Continually improving our agent harness” – https://cursor.com/blog/continually-improving-agent-harness

Simon Willison: “Agents are models using tools in a loop” – https://simonwillison.net/2025/May/22/tools-in-a-loop/

Harrison Chase: “Continual learning for AI agents” – https://www.xrticles.com/article/continual-learning-for-ai-agents

Code example

400 Bad Request

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

system architecture AI agents Evaluation Context Management Tool Design Harness

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.