Artificial Intelligence 15 min read

Understanding AI Agents: Core Modules, Planning Strategies, and Evaluation

This article explains what an AI agent is, outlines its four core modules—perception, memory, planning, and action—describes the role of large language models, compares software development generations, discusses memory implementations, planning methods like ReAct and Plan‑and‑Solve, and covers evaluation, cost analysis, and differences between agents and workflows.

Youzan Coder

Aug 13, 2025

Understanding AI Agents: Core Modules, Planning Strategies, and Evaluation

What is an Agent

Agent is a system that can perceive the environment, make decisions, and take actions to achieve a goal.

Imagine an agent as a person: when you see a car approaching, you instinctively avoid it. The eyes capture the scene, the brain recalls that not moving could cause a collision, and you step aside. This perception‑decision‑action loop illustrates the core of an agent.

Agent Core Modules

Different teams divide an agent into four essential modules: perception, memory, planning, and action.

LLM (Large Language Model)

Large language models are the brain of an agent, designed based on neural networks that mimic human brain connections and signal transmission.

Software development has evolved through three generations:

Software 1.0 : Traditional hand‑written code by programmers.

Software 2.0 : Deep learning‑driven models where data trains neural network weights, reducing manual coding.

Software 3.0 : AI‑assisted development tools (e.g., Cursor, Trae, WinSelf) that let developers program with natural language, dramatically lowering the barrier to software creation.

Recommended reading: 《Deep Learning Revolution》 and 《A Gentle Introduction to Neural Networks and Deep Learning》 .

Perception

Perception is how an agent senses its environment. Humans use eyes and ears; robots use radar and cameras. In software, perception comes from input data such as text, images, video, audio, or files.

Memory

Large models lack intrinsic memory; external techniques like Retrieval‑Augmented Generation (RAG) or Elasticsearch simulate memory to improve task performance and interaction continuity.

Memory is divided into short‑term and long‑term types. Short‑term memory depends on the token limit of the model context, while long‑term memory relies on external storage (files, databases) and can include user profiles, situational memory, or factual knowledge.

Short‑term memory depends on token limits

Reference code for managing messages passed to the LLM:

<span>ounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(<span>lineounter</span>(line</code><code>interface ChatMessage {</code><code>  role: 'user' | 'assistant' | 'system' | 'tool';</code><code>  content: string;</code><code>  tool_calls?: xxx;</code><code>  call_id?: string;</code><code>}</code><code>class Memory {</code><code>  private messages: ChatMessage[] = [];</code><code>  constructor() {</code><code>    this.messages = [];</code><code>  }</code><code>  addMessage(message: ChatMessage) {</code><code>    this.messages.push(message);</code><code>  }</code><code>  getMessages() {</code><code>    return this.messages;</code><code>  }</code><code>}

Long‑term memory relies on external components

Long‑term memory can be stored in files or databases, searchable and updatable. It can be further categorized into user‑profile memory, situational memory, and factual memory.

In a cake‑baking assistant case, real‑time dialogue is kept as working memory; once a threshold is reached, the LLM extracts key information to form short‑term memory. Long‑term memory stores user profiles and business insights, which are recalled and updated on each model call.

The system is organized into three layers: working memory, short‑term memory, and long‑term memory.

Each query updates short‑term memory recall counts via vector search.

Short‑term memory is extracted by the LLM as edge memory and assigned importance scores.

Action

Action is the execution part. The LLM (brain) needs tools (hands) to act. Agents must describe tools (name, description, parameters) so the LLM can decide when and how to use them.

Example: a product‑query tool that finds items in a merchant’s catalog based on a question or image.

Keyword search: extract key information from product images and titles using a multimodal model, then perform similarity search.

Image search: vectorize product images, extract the desired image link from the user query, and match it against the catalog.

const tools = [
    {
        "type": "function",
        "function": {
            "name": "query_goods",
            "description": "查询蛋糕商品信息",
            "parameters": {
                "type": "object",
                "properties": {
                    "keywords": {
                        "type": "array",
                        "items": {"type": "string","description": "蛋糕商品关键信息项"},
                        "description": "蛋糕商品关键信息，包含适用场合、目标人群、风格、口味、主题、颜色等"
                    },
                    "image_url": {"type": "string","description": "蛋糕商品图片链接"}
                }
            }
        }
    },
    ...
];

Planning

Two common planning patterns are ReAct and Plan‑and‑Solve, with multi‑agent approaches also possible.

ReAct (Reason + Action + Feedback)

Agents iteratively reason, act, and receive feedback to achieve the user’s goal.

Cake‑baking assistant flow diagram:

Plan‑and‑Solve

Tasks are decomposed into sub‑goals first, then executed in parallel. Unlike ReAct, which proceeds step‑by‑step, Plan‑and‑Solve separates planning from execution.

Example: a travel‑planning agent first generates a plan (weather, accommodation, etc.) and then solves each sub‑task, optionally reflecting on the plan.

Agent Evaluation Report

Evaluation is essential and should record response time, completeness, tool usage, and optimization opportunities to provide data‑driven improvements.

Agent Cost Analysis

AI‑driven applications incur costs per model call, unlike traditional software. Analyzing these costs helps assess value and guide optimization.

Agent vs Workflow

Agents require only a goal description; workflows require a predefined execution path.

Choosing

Workflow : suitable for data pipelines, document‑approval processes, and other fixed, repeatable tasks that need strict order and auditability.

Agent : ideal for intelligent assistants, personal AI, complex problem diagnosis, dynamic decision‑making, personalized context understanding, creative problem solving, and conversational experiences.

FM Agent vs RL Agent

FM (Foundation Model) agents are built on large pretrained models such as GPT‑4, Claude, or Gemini, leveraging LLM capabilities for perception, reasoning, planning, and generation.

RL (Reinforcement Learning) agents learn optimal policies through interaction with an environment and reward signals, exemplified by DeepMind’s AlphaGo Zero.

Conclusion

Agents, LLMs, and workflows are tools; their true value emerges when they are applied to real business scenarios, solve concrete problems, and create measurable impact.

AI LLM Agent Memory Perception Planning

Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.