Artificial Intelligence 17 min read

Why Context Engineering Is the Secret to Powerful AI Agents

This article explains how AI agents work through perception, planning, and action, describes the four supporting systems—memory, tools, safety, and evaluation—and shows how the evolution from prompt engineering to context engineering, with strategies like selective saving, retrieval, compression, and modularization, addresses the core challenges of managing large‑scale context for reliable, efficient agent performance.

Architecture and Beyond

Jul 27, 2025

Why Context Engineering Is the Secret to Powerful AI Agents

1. Three Core Actions of an Agent

To understand context engineering, we first need to know the basic workflow of an AI Agent, which can be summarized as three core actions: perception, planning, and action.

1.1 Perception

Perception is the first step and is often overlooked. Like a driver needs to see the road, an Agent must accurately understand the current situation. Perception has two layers:

State perception : understanding the objective environment, e.g., the programming language and framework used, the existing code structure, and any dependencies or constraints.

Intent perception : grasping what the user actually wants, such as whether the goal is performance improvement or readability, any specific performance metrics, and whether backward compatibility is required.

Many Agent failures stem from poor perception; for example, if a user says “this is too slow” and the Agent only interprets it as “needs optimization” without probing where the slowness occurs, the subsequent optimization may miss the target.

1.2 Planning

With accurate perception, the Agent must devise an action plan, similar to deciding the cooking steps before preparing a dish. Planning ability largely depends on the underlying large language model, which exhibits different "personalities": Claude tends to reason deeply and consider many possibilities; GPT‑4 is balanced and stable for standard tasks; Gemini is bold and sometimes proposes innovative solutions.

From an architectural perspective, Gemini focuses on multimodal fusion and large‑scale context handling, while Claude emphasizes precise reasoning and complex task execution. However, even a strong model cannot produce a good plan without sufficient information; missing key details leads to poor plans.

1.3 Action

The final action module enables the Agent to actually "do something". The mainstream implementation is function calling: predefined tool functions such as web search, file read/write, sending email, etc., are invoked by the model as needed. The challenge lies in tool selection—when many tools are available, the Agent can become confused, much like a toolbox with many unknown tools.

2. Four Supporting Systems of an Agent

Beyond the core perception‑plan‑action loop, modern Agents need four important supporting systems.

2.1 Memory System

Memory lets an Agent accumulate experience, similar to human memory. It includes three types:

Working memory : temporary information for the current task.

Situational memory : records of previous dialogues and tasks.

Semantic memory : domain knowledge and best practices.

A good memory system must also know how to forget. Storing irrelevant, outdated, or erroneous information can interfere with judgment, just as an uncleaned computer cache slows down performance.

2.2 Tool System

Tools allow an Agent to interact with the external world. Early Agents could only answer questions; modern Agents can search information, manipulate files, call APIs, and even control other software. Anthropic’s MCP (Model Context Protocol) attempts to standardize tool calling, similar to a USB standard.

2.3 Safety System

Giving an Agent execution ability is like handing a child scissors; safety measures are essential. Early versions of Manus suffered security issues where specially crafted prompts caused the Agent to package all code in the execution environment. Modern Agents typically employ sandbox isolation, dynamic permission management, and audit logs.

2.4 Evaluation System

Evaluating an Agent’s performance is complex because outputs often have multiple plausible "correct" answers. Evaluation considers task completion, efficiency and cost, user satisfaction, and safety/compliance.

3. From Prompt Engineering to Context Engineering

Understanding the Agent’s workflow leads to the evolution of engineering practices.

3.1 Limitations of Prompt Engineering

Last year many focused on crafting prompts such as "You are a professional X" or "Think step by step". However, static prompts struggle with complex tasks because they provide only fixed instructions while real tasks require dynamic information.

3.2 Rise of Context Engineering

Context engineering is a broader concept that provides all necessary information so that a task becomes solvable for the LLM. According to Shopify CEO Tobi Lütke, it is "the art of supplying all required context to make a task tractable for an LLM".

Context is not just a prompt; it is everything the Agent can see before generating a response:

Instruction context : system prompts, task descriptions, behavioral rules.

Dialogue context : current and historical conversation, including user intent and desired output format.

Knowledge context : relevant documents, database entries, search results.

Tool context : descriptions and usage of available functions.

State context : environment variables, user preferences, system status.

Context engineering is a dynamic system, not a static template. It selects relevant information based on task type, updates as the dialogue progresses, and balances completeness with brevity.

Google DeepMind engineer Phil Schmid illustrated seven components of context engineering (instruction/system prompt, user prompt, short‑term memory, long‑term memory, retrieval, available tools, structured output) that map onto the five context categories above.

4. Core Challenges of Context Engineering

Karpathy likens LLMs to a new operating system where the context window is RAM. The main challenges are:

Capacity limits : even models with hundreds of thousands of tokens struggle with long‑running Agents that generate massive interaction logs.

Attention drift : overly long context causes the model to remember the beginning and end well but lose the middle, similar to forgetting the middle chapters of a thick book.

Performance degradation : long context increases cost and latency and can introduce context pollution, interference, and conflicts.

5. Four Core Strategies for Context Engineering

5.1 Selective Saving

Important information is stored outside the immediate context window and retrieved when needed. Two common patterns are:

Notebook mode : the Agent takes notes during execution, saving intermediate results and key findings.

Long‑term memory : persisting information across sessions, such as user preferences, domain knowledge, and successful cases (e.g., ChatGPT, Cursor).

The key is to save only information that truly adds value.

5.2 Retrieving the Right Information

When stored, the information must be fetched at the right moment. Techniques include:

Memory retrieval : semantic search rather than simple keyword matching to locate relevant past memories.

Tool filtering : presenting only the tools relevant to the current task, which can triple accuracy.

Knowledge recall (RAG) : combining vector search, keyword search, AST parsing, and knowledge graphs, then re‑ranking results.

5.3 Compression and Refinement

When context becomes too large, it must be compressed:

Trajectory summarization : Claude Code automatically summarizes the dialogue when token usage exceeds 95%.

Targeted compression : summarizing large search results immediately or compressing information passed between Agents.

Intelligent pruning : removing information based on time, frequency, or relevance.

5.4 Divide and Conquer

Separate different types of information to avoid interference:

Multi‑Agent architecture : specialized Agents handle sub‑tasks, each with its own context space.

Environment isolation : execute generated code in a sandbox and return only necessary results (e.g., HuggingFace approach).

State separation : design distinct fields for different information types and expose them to the model only when needed.

6. Summary

Context engineering is becoming a core skill for AI engineers. As Agent capabilities grow, managing and optimizing context will be increasingly important. Future directions include adaptive context that learns what information matters, distributed context across multiple Agents, personalized context based on user traits, and real‑time optimization that adjusts context during execution.

In short, the focus has shifted from merely "prompting" a model to building a complete information system that enables an Agent to truly understand tasks, acquire necessary knowledge, and make correct decisions.

As Cognition puts it, "Context engineering is the top priority work for engineers building AI Agents."

AI agents LLM prompt engineering Context Engineering Memory Systems

Written by

Architecture and Beyond

Focused on AIGC SaaS technical architecture and tech team management, sharing insights on architecture, development efficiency, team leadership, startup technology choices, large‑scale website design, and high‑performance, highly‑available, scalable solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.