How Agents Work: Inside Their Perception, Planning, Action, and Memory

This article breaks down an AI agent's workflow—perception, planning, action, and memory—using a product‑launch example, explains reasoning methods like Chain‑of‑Thought and ReAct, details tool integration, memory types, common failure modes, and why planning and tool ecosystems are essential.

AI Illustrated Series
AI Illustrated Series
AI Illustrated Series
How Agents Work: Inside Their Perception, Planning, Action, and Memory

In the previous article we introduced what an Agent is. Many readers understood the concept but still found it abstract—terms like “planning”, “perception”, and “memory” sound textbook‑like. This article uses concrete examples to demystify the Agent’s work process.

First, a complete task

Scenario: You ask an Agent to help prepare a product launch event.

You: "Help me prepare next week's product launch, need to finalize venue, guests, agenda, and send a complete plan to my email."

The Agent completes this through four stages.

Perception: Understands the request and decomposes it into sub‑tasks such as "launch plan", "venue", "guests", "agenda", and "email".

Planning: Creates an execution sequence: confirm basic info, search candidate venues, contact guests, draft agenda, then compile the plan and email it.

Action: Calls various tools to search venues, send invitation emails, generate documents, and dispatch the email.

Memory: Records progress and results so that similar future tasks can be handled faster.

Layer One – Perception: What the Agent "sees"

The first step is perceiving the input, which is more than just parsing text.

Understanding the user’s true intent: From "help me prepare a launch" the Agent must infer the product type, scale, special requirements, and budget. If information is missing, it proactively asks clarifying questions.

Perceiving contextual state: The Agent also observes the current project status—what information has already been gathered and which steps remain.

Perceiving tool returns: After invoking a tool, the Agent treats the tool’s output as new perception. For example, a venue search may return ten options with prices, and an invitation send may return success or failure status, which the Agent must interpret for the next step.

Layer Two – Planning: How the Agent thinks

Planning is the Agent’s core capability. While many view AI as simple input‑output, a truly effective Agent needs strong planning.

Why planning matters: Without a plan, the Agent might search for a venue, book it immediately, then start confirming guests, only to discover the venue is unsuitable, leading to chaos and conflicts. With a plan, the Agent follows a clear sequence with checkpoints, avoiding such pitfalls.

Method One: Chain‑of‑Thought (CoT)

The core idea is to make the Agent verbalize its reasoning.

Without CoT, asking "2x+5=15, solve for x" yields the answer "x=5" directly.

With CoT, the Agent says: "First, move the constant to the other side: 2x = 15‑5 = 10. Second, divide both sides by 2: x = 10 ÷ 2 = 5. Answer: x = 5." This makes the reasoning transparent and reduces guesswork.

Method Two: ReAct (Reasoning + Action)

ReAct extends CoT by interleaving thinking and acting: think → act → observe → think again, and repeat until the task is finished.

Task example: Gather the latest Tesla news and write a briefing.

Round 1: Think – identify needed information; Act – call the search tool with keyword "Tesla latest"; Observe – receive ten news items covering earnings, new models, and personnel changes.

Round 2: Think – earnings are most important; Act – search "Tesla 2026 earnings"; Observe – retrieve revenue growth of 20% and net profit exceeding expectations.

Round 3: Think – need market reaction and analyst views; Act – search "Tesla stock analyst"; Observe – obtain market analysis and ratings.

Round 4: Think – enough information to write the report; Act – invoke document‑generation tool; Observe – report generated successfully.

This illustrates that each step is driven by the observation from the previous tool call rather than by blind guessing.

Layer Three – Action: How the Agent executes

After planning, the Agent executes by invoking tools. Common tool types include search engines for real‑time information, knowledge‑base queries for internal documents, code execution for calculations and charts, file operations for reading/writing documents, and communication tools for sending emails or messages.

Key distinction: Large‑model knowledge is static (trained once), whereas an Agent obtains real‑time information via tools. For example, a model might say "Tesla stock rose 50% last year"—potentially outdated—while the Agent can fetch the latest price and include it in the report.

Layer Four – Memory: What the Agent remembers

Memory differentiates Agents from ordinary chatbots.

1. Sensory memory: The raw input of the current turn, like an image or file, is kept briefly (seconds) and then released.

2. Working memory: The information the Agent is currently processing—the task goal, completed steps, next actions, and recent tool results. Capacity is limited; overload can cause omissions.

3. Long‑term memory: Knowledge accumulated across sessions—user name, role, preferences, project background, useful tools, and past pitfalls. This enables the Agent to "understand you" without re‑explaining context each time.

Why Agents can "go wrong"

Agents sometimes produce unexpected results for several reasons.

Planning errors: Poor task decomposition or wrong priority leads to a misguided execution path.

Tool‑call failures: API timeouts, insufficient permissions, or network issues cause empty or erroneous tool outputs.

Incorrect observation: Overly verbose or insufficient tool results can be misinterpreted, steering the Agent off course.

Memory confusion: Limited working‑memory capacity may cause the Agent to forget earlier steps or goals.

My viewpoint

Agent operation essentially mirrors human thinking patterns : understand the task, decompose goals, execute step by step, check results, and adjust strategy. The Agent automates this process.

However, an Agent's "intelligence" hinges on its planning ability and tool ecosystem. Weak planning makes the Agent act blindly; insufficient tools make it impossible to accomplish tasks, akin to "a good cook without ingredients".

Therefore, building an Agent product requires more than a large model—tool integration, planning algorithms, and a memory system are all indispensable.

Next time

This article covered the Agent’s workflow. In the next installment we will detail how Agents "summon" various tools— the principles of Function Calling—explaining how a large model moves from merely speaking to actually doing.

ReActTool IntegrationAI AgentMemoryPerceptionPlanning
AI Illustrated Series
Written by

AI Illustrated Series

Illustrated hardcore tech: AI, agents, algorithms, databases—one picture worth a thousand words.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.