Repository Intelligence & Context-Aware AI

38 min read

From Autocomplete to AI Coding Agent: Inside the Architecture of Modern Code Assistants

This article breaks down how AI coding tools have progressed from simple autocomplete to sophisticated coding agents, detailing their core components, workflow mechanics, cost and space management, conflict handling, memory strategies, and extensibility through Rules, MCP, and Skills, with practical code examples.

Baidu Tech Salon

Jan 19, 2026

From Autocomplete to AI Coding Agent: Inside the Architecture of Modern Code Assistants

Background

AI coding tools have progressed from simple autocomplete to full‑featured "coding agents" that understand project context, invoke tools, and perform multi‑step reasoning.

Core Components of a Coding Agent

Identity Definition

The model is given a concrete identity (e.g., "Senior Front‑End Developer") that frames its behavior, goals, and constraints.

Tool Invocation

Agents use a set of tools (read, write, edit, list, grep, etc.) each described by a JSON schema. Example tool definition:

{
  "name": "read",
  "description": "Read the contents of a file. Optionally specify line range.",
  "parameters": {
    "type": "object",
    "properties": {
      "path": {"type": "string", "description": "File path"},
      "lineStart": {"type": "integer", "description": "Start line (1‑indexed)"},
      "lineEnd": {"type": "integer", "description": "End line (1‑indexed)"}
    },
    "required": ["path"]
  }
}

Environment Awareness

At the start of a conversation the agent receives a snapshot of the project layout (a tree view that respects .gitignore), which remains static unless explicitly refreshed.

Workflow Example: Leave Request

Three workflow styles illustrate the agent’s capabilities:

Fixed workflow – a static sequence of steps (open page, fill dates, submit).

Fuzzy workflow – the model extracts parameters from ambiguous natural‑language input.

Dynamic reasoning – the model iteratively determines start date, duration, and calendar constraints before forming the final request.

Cost Control

Claude pricing shows output tokens cost roughly five times more than input tokens, while cache hits are an order of magnitude cheaper. Effective caching can reduce overall cost by 8‑10×.

Space Management

When the context length approaches the model limit (~128 K tokens), agents employ two strategies:

Trimming : Remove irrelevant or outdated information, keeping the latest full file content.

Compression : Summarize earlier conversation into an eight‑section outline (primary request, key concepts, files, errors, problem solving, user messages, pending tasks, current work).

const COMPRESSION_SECTIONS = [
  "1. Primary Request and Intent",
  "2. Key Technical Concepts",
  "3. Files and Code Sections",
  "4. Errors and fixes",
  "5. Problem Solving",
  "6. All user messages",
  "7. Pending Tasks",
  "8. Current Work"
];

Attention Optimization

Agents prepend a <reminders> block to the latest message. This block can contain TODOs, tool availability notices, or behavioral cues and is never cached, ensuring it always influences the next turn.

<reminders>
- Planned todos:
  - [x] Explore code related to "print" function
  - [x] Add "flush" parameter to function
  - [ ] Refactor all "print" calls to use the new parameter
</reminders>

Conflict Control

When multiple agents or a human edit the same file, three approaches are used:

Locking : Verify that a file has not changed since the last read before applying an edit.

Push notifications : Add change notices to the next <reminders> block so the model re‑reads the file.

Isolation via Subagents : Each subagent works in its own Git worktree or temporary branch, preventing direct interference.

<!-- Assistant -->
edit(file, console.log...)
<!-- User -->
This edit is rejected, the file has been modified since your last read or edit, you should read this file again before executing any write or edit actions.
<!-- Assistant -->
read(file)

Persistent Memory

Agents are stateless by default. Memory can be added through:

Tool‑based updates (rarely used).

Summarization of the entire dialogue after each session.

Storage‑based retrieval: raw logs are kept on disk and the model reads them with read, list, or grep when needed.

Ability Extensions

Rule

Static, project‑specific knowledge (coding conventions, framework quirks) stored as a Rule. Rules are part of the immutable system prompt and are cached for the whole session. Good Rules are concise (< 500 lines), modular, and indexed by file path or scenario.

MCP (Model Context Protocol)

A standardized way to expose external services as tools. An MCP server runs as a subprocess and is described in the agent configuration. Example: GitHub integration.

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {"GITHUB_PERSONAL_ACCESS_TOKEN": "YOUR_GITHUB_TOKEN"}
    }
  }
}

Once registered, the model can create issues, query PR status, or post comments without hard‑coding API calls.

Skill

A Skill bundles a SKILL.md manifest with optional scripts, reference documents, and assets. The manifest contains mandatory name and description (used for model triggering) and an optional compatibility field.

skill-name/
├── SKILL.md          # required, contains YAML front‑matter and Markdown description
│   name: "Less‑to‑PostCSS Migrator"
│   description: "Migrate a project from Less to PostCSS, handling variables, mixins, and file extensions."
│   compatibility: "node >=14"
├── scripts/          # executable scripts (Python, Bash, etc.)
│   └── migrate.sh
├── references/       # optional docs loaded on demand
│   └── less‑features.md
└── assets/           # icons, templates, etc.
    └── logo.png

Skills are activated lazily: the model decides to load a Skill based on its description. Bundled resources are accessed with normal tools ( read, list, grep) when needed.

Subagents

For large or complex tasks, the main agent can spawn Subagents that run in independent context spaces (e.g., separate Git worktrees). The main agent coordinates them, while each Subagent focuses on a small, well‑defined goal, keeping cache hits high and avoiding context overflow.

1. Start Subagent – goal: locate Webpack splitChunks configuration.
2. Subagent reads relevant files, returns summary.
3. Main agent launches another Subagent to modify the configuration and verify the build.
4. Repeat until the desired result is achieved.

Conclusion

Modern AI coding agents combine identity prompting, tool invocation, environment perception, and extensible mechanisms (Rule, MCP, Skill) to turn raw language models into practical development assistants. Careful management of cache, context length, attention, conflict resolution, and persistent memory is essential for cost‑effective, reliable operation. By modularizing expertise into Rules, MCP servers, and Skills, teams can tailor agents to specific codebases while preserving safety and scalability.

Written by

Baidu Tech Salon

Baidu Tech Salon, organized by Baidu's Technology Management Department, is a monthly offline event that shares cutting‑edge tech trends from Baidu and the industry, providing a free platform for mid‑to‑senior engineers to exchange ideas.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.