AI‑First Architecture Constraints: Tool Limits, Refactor Triggers, and Context

The article examines six practical challenges of AI‑First development—oversized tool libraries, when to trigger refactoring, propagating newly extracted methods, duplicate code from parallel sub‑agents, context aging, and the lack of a unified framework—while presenting concrete solutions such as three‑layer loading, sub‑agent isolation, semantic search, consolidation agents, persistent context files, and adaptive compression strategies.

AI Step-by-Step
AI Step-by-Step
AI Step-by-Step
AI‑First Architecture Constraints: Tool Limits, Refactor Triggers, and Context

Problem 1 – Tool‑library size exceeds context budget

In medium‑to‑large projects the tool library can surpass 5,000 lines. Loading more than ~100 lines of context causes a sharp drop in constraint compliance. Baidu’s 2026 practice on a 50‑tool project reduced initial token load from 42,300 to 980 tokens (97.6% compression) and raised tool‑call success from 82% to 99%.

Three‑layer progressive loading splits tool information into:

Metadata layer (50‑200 tokens per tool): ID, category, one‑sentence purpose, example I/O. Loaded at session start.

Instruction layer : input‑output specs, parameter constraints, call path. Loaded on demand when the model decides to invoke the tool.

Resource layer : full API docs, usage examples, edge cases. Cached with an LRU‑K policy and loaded only for complex tasks.

Implementation example: keep a ≤20‑line registry in CLAUDE.md with name, path, and brief purpose. When a tool is needed, the model reads the corresponding source file’s JSDoc to obtain detailed constraints.

Sub‑Agent isolation creates a separate Claude instance per sub‑agent, each with its own context window and a YAML tools whitelist (e.g., tools: [Read, Grep, Glob, Bash]). This prevents the main agent from loading all tools and allows domain‑specific agents (e.g., DB migration, UI component) to carry only relevant tool sets.

Semantic search vectorises tool signatures and JSDoc descriptions in a local vector DB. The model queries the DB to retrieve relevant tools on demand, as done by Anthropic’s claude‑context MCP server. When the tool count exceeds ~30, evaluate adding this layer.

Practical steps:

Maintain a ≤20‑line tool‑registry file.

Embed constraints in each tool’s JSDoc.

Define domain‑specific sub‑agents with limited tool whitelists.

Introduce a semantic‑search service if tool count >30.

Problem 2 – When to trigger refactoring of duplicated patterns

Refactoring in AI‑generated code follows three stages: detection → judgement → execution.

Detection distinguishes simple text repeats from semantic repeats. Open‑source engines that support this include:

GitHub Serena – semantic analysis across files, creates [duplicate‑code] issues.

Anatoly’s dual‑embedding (code + NLP) with LanceDB – mixed similarity scoring to catch functionally equivalent code using different libraries.

Kodify’s Scout Agent – uses Vertex AI text‑embedding‑005 + vector search during MR creation.

Judgement criteria (derived from GitHub 2026 data) are:

Span threshold: duplicate appears in ≥2 files (cross‑file repeats have higher priority).

Semantic similarity: embeddings must indicate functional equivalence.

Change frequency: files with recent frequent edits get higher priority.

Risk routing: simple repeats → automated handling; complex business logic → LLM‑suggested PR; high‑risk (security, data integrity) → manual review.

Minimum size: at least 3 lines duplicated in ≥2 files.

Execution examples:

Kodify’s refactor agent clones the target branch, applies changes in a sandbox, runs lint/tests/build, and commits only after all checks pass.

DocuSign’s “Elf” agent can be triggered via Jira ticket, Slack command, or direct request, then monitors its own PR and reacts to review feedback.

Key practice: separate refactor triggering from the coding agent (Writer/Reviewer split) so the coding agent does not self‑evaluate its own output.

Problem 3 – Immediate reuse of newly extracted methods

Without a propagation mechanism, newly extracted functions are often re‑implemented. Three mechanisms are documented:

Git‑hook driven context update : the aggentctx npm package adds a post‑commit hook that writes a .agentctx/pending‑review.md file containing commit message, changed files, and diff. On the next session start, CLAUDE.md instructs the model to read this file, detect new reusable functions, and update the tool registry.

git commit
↓
hook → write .agentctx/pending‑review.md (message + files + diff)
↓
next session → CLAUDE.md reads pending‑review.md
↓
AI updates tool‑registry

SessionStart hook : a JSON hook configuration causes the model to execute cat .claude/tool‑registry.md whenever a session starts (including after /clear or /compact), ensuring the latest registry is loaded.

{
  "hooks": {
    "SessionStart": [{
      "matcher": "startup|clear|compact",
      "hooks": [{
        "type": "command",
        "command": "cat .claude/tool‑registry.md"
      }]
    }]
  }
}

Compound Engineering learning loop : after extraction, the method is recorded in docs/solutions/ and a final /ce‑compound step writes the learning summary. Subsequent sessions load the updated docs via the SessionStart hook.

Microsoft’s auto‑memory project indexes such structured learning summaries and automatically surfaces them in future sessions, moving propagation from manual to automated retrieval.

Problem 4 – Duplicate code from parallel sub‑agents

Parallel agents can independently generate nearly identical code. Real‑world evidence:

In the ai‑agents open‑source project, 5 parallel agents produced 489 duplicated lines across 12 files.

MSR 2026 study of 7,851 AI‑generated PRs found 28,425 code clones; 320 PRs contained cross‑submission duplicate clones.

Mitigation layers (adopted incrementally):

Post‑execution consolidation : a Consolidation Agent runs after all agents finish, scans outputs for shared code blocks (≥3 lines appearing in ≥2 files), extracts them into a shared helper, and updates references.

Shared task map : when dispatching work, each agent receives a map of other agents’ responsibilities and declared tool functions, preventing overlap.

Shared tool registry : a common .claude/tool‑registry.md is updated in real time; agents read it before generating code to discover newly created functions. Implementation can use a shared file with git‑merge strategies.

Teams typically start with the consolidation stage and add the shared map and registry as parallelism scales beyond 5 agents.

Problem 5 – Context aging and loss of constraints

Long‑running sessions exhibit “context rot”: after 15‑20 interaction rounds compliance drops, and after ~30 rounds key constraints are forgotten. Benchmarks on 1 M‑token windows show retrieval accuracy falls to ~30% once ~70% of the window is consumed (≈150‑200 k tokens in a typical Claude Code session).

Three strategies mitigate this:

Persist constraints : store essential rules in files such as CLAUDE.md, AGENTS.md, and a tool‑registry. These files are re‑read at every session start, avoiding compression loss.

Adaptive Context Compression (ACC) : progressive thresholds trigger increasingly aggressive compression:

70% utilization → warning, suggest manual /clear.

80% → mask older tool outputs.

85% → prune low‑information content.

90% → aggressive masking, keep only key tool results.

99% → LLM‑generated summary preserving core signals.

/clear + state recovery : after persisting constraints, issue /clear to reset dialogue history. Hooks automatically reload the persisted files, turning /clear into a proactive maintenance step.

Core principle: move critical constraints out of volatile dialogue into durable files.

Problem 6 – Unified framework for AI‑First architectural safeguards

All five preceding problems stem from limited, non‑persistent memory of AI agents. 2026 practice converges on five complementary mechanism modules:

Persistent context files – store constraints and architecture docs in CLAUDE.md, AGENTS.md, and a tool‑registry. Low implementation complexity.

Sub‑Agent delegation – isolate tool sets per domain using Task tools and YAML whitelists. Medium complexity.

Automatic context propagation – Git hooks, SessionStart re‑reads, and Compound Engineering learning loops keep newly created artifacts visible to future sessions. Medium complexity.

Writer/Reviewer separation – independent review or consolidation agents detect and refactor duplicates after code generation. Medium complexity.

Hard‑stop guardrails – PreToolUse hooks and schema‑level tool removal prevent agents from bypassing critical constraints. High complexity (requires custom scripting).

Context lifecycle management – proactive /clear, ACC multi‑level compression, and state‑recovery hooks extend usable session length. Low‑to‑medium complexity.

Projects can adopt modules incrementally based on size and risk profile: small projects may only need persistent files and lifecycle management; medium projects add automatic propagation and writer/reviewer split; large, multi‑team efforts benefit from the full suite.

Core principles

Store architectural and tool constraints in non‑compressible files that are re‑loaded each session.

Break the “self‑review” loop by using independent review or consolidation agents, ensuring higher code quality and reducing duplicate effort.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AI agentsrefactoringsemantic searchcontext managementtool registryparallel agents
AI Step-by-Step
Written by

AI Step-by-Step

Sharing AI knowledge, practical implementation records, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.