How a Tiny AGENTS.md Change Boosted AI Coding Accuracy from 53% to 100%

A Vercel team experiment shows that replacing the Skills approach with a small 8 KB AGENTS.md file raised AI coding agents' pass rate from 53% to a perfect 100%, revealing the fragility of explicit tool calls and the strength of passive, always‑available context.

AI Engineering
AI Engineering
AI Engineering
How a Tiny AGENTS.md Change Boosted AI Coding Accuracy from 53% to 100%

Core Problem

AI coding agents often lag behind framework updates because their training data lacks newly introduced APIs such as 'use cache', connection(), and forbidden() in Next.js 16. When encountering these APIs, agents either generate incorrect code or fall back to outdated patterns. The reverse also occurs: older Next.js versions may prompt the model to suggest APIs that do not yet exist.

Two Competing Approaches

Skills are an open standard that bundles domain knowledge, prompts, tools, and documentation for agents to call on demand. Ideally, an agent detects a need for framework‑specific help, invokes the skill, and receives the relevant docs.

AGENTS.md is a markdown file placed at the project root that provides continuous context to the coding agent. Whatever is placed in AGENTS.md is accessible in every round without the agent having to decide whether to load it.

Vercel initially bet on Skills, expecting concise context and on‑demand loading.

Unexpected Results

In evaluation, agents ignored available Skills 56% of the time, yielding the same 53% pass rate as the baseline with no Skills. This unreliability of tool usage is a known limitation of current models.

Fragility of Explicit Instructions

The team added a clear directive in AGENTS.md: “Before writing code, first explore the project structure, then call the nextjs‑doc skill for documentation.” This raised the skill‑trigger rate above 95% and the pass rate to 79%, but minor wording changes caused large behavior swings:

“You must call the skill” – the agent reads the doc first, anchoring on documentation mode and missing project context.

“Explore the project then call the skill” – the agent builds a mental model first, then references the doc, producing better results.

In a 'use cache' test, the “call first” wording produced a correct page.tsx but omitted the required next.config.ts change, whereas the “explore first” wording got both right.

Building a Trustworthy Evaluation

The initial test suite suffered from vague prompts, implementation‑focused checks, and focus on APIs already present in training data. The team strengthened the suite by removing test leakage, resolving contradictions, switching to behavior‑based assertions, and concentrating on APIs absent from the model’s training data, such as connection(), 'use cache', cacheLife(), cacheTag(), forbidden(), unauthorized(), proxy.ts, cookies(), headers(), after(), updateTag(), and refresh().

Key Turning Point

Instead of relying on the agent to decide when to invoke a skill, the team embedded a document index directly into AGENTS.md, pointing the agent to version‑matched documentation files. They added the instruction: “Important: For any Next.js task, prefer retrieval‑led reasoning over pre‑training‑led reasoning.” This tells the agent to look up docs rather than depend on possibly stale training knowledge.

Winning Results

Evaluation showed that the AGENTS.md method achieved 100% pass rates across build, code‑check, and test dimensions, while the best‑case Skills approach capped at 79%.

Baseline (no docs): 53%

Skills (default): 53%

Skills (explicit instruction): 79%

AGENTS.md index: 100%

The “clumsy” static markdown outperformed the more sophisticated skill‑retrieval system.

Why Passive Context Wins

No decision point : The agent never has to ask “Should I look up docs?” because the information is already present.

Continuous availability : Skills load asynchronously, whereas AGENTS.md content is injected into every system prompt.

Avoiding ordering issues : Skills introduce sequencing complexity (read doc vs explore project first).

Mitigating Context Bloat

Embedding full docs risks context explosion. The team compressed the index from 40 KB to 8 KB (an 80% reduction) while preserving 100% pass rate. The compressed format uses a pipe‑separated structure, e.g.:

[Next.js Docs Index]|root: ./.next-docs|IMPORTANT: Prefer retrieval-led reasoning over pre-training-led reasoning|01-app/01-getting-started:{01-installation.mdx,02-project-structure.mdx,...}

The agent knows where to find documentation in the .next-docs/ directory without loading all content into context.

Immediate Usage

Run the following command to set up AGENTS.md for a Next.js project: npx @next/codemod@canary agents-md Detect the Next.js version.

Download matching docs into .next-docs/.

Inject the compressed index into AGENTS.md.

The same method works with agents that respect AGENTS.md, such as Cursor.

Implications for Framework Authors

Skills are not useless; they excel for user‑triggered vertical workflows like version upgrades or migrations. However, for general framework knowledge, passive context currently outperforms on‑demand retrieval. Framework maintainers should consider providing AGENTS.md snippets that developers can add to projects.

Practical Advice

Don’t wait for Skills to improve; current results matter.

Compress documentation; an index pointing to files is sufficient.

Use evaluation tests that target APIs missing from training data.

Design docs for easy retrieval rather than loading everything upfront.

The goal is to shift agents from pre‑training‑led reasoning to retrieval‑led reasoning, and AGENTS.md proves to be the most reliable path.

evaluationNext.jsSkillsAGENTS.mdAI coding agentsretrieval-based reasoning
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.