How to Understand Agents: From Resource‑Constrained Decisions to Contextual Cognition
This survey clarifies the essence of AI agents as resource‑limited sequential decision‑making and contextual‑cognition systems, introduces a formal definition, outlines a five‑stage evolution of large models, presents a four‑loop architecture, and illustrates the concepts with the OpenClaw agent case study.
What is an Agent?
An Agent is defined as a system that operates in a dynamic environment, pursues goal completion, and is constrained by a limited inference budget. At each timestep the current context C_t is used to generate an action, and the resulting trajectory is evaluated by a utility function. Formally:
τ* = arg max E[U(τ)]
π_LLM(·|C_t) generates each action
C_t = current context
B = budget constraintThe definition highlights two core tasks: continuously absorbing external feedback to update internal state, and repeatedly evaluating and selecting the next step (sequential decision‑making and search optimization).
Agentic AI
Agentic AI extends the agent concept to a system that continuously organizes perception, interaction, reasoning, and execution toward a goal. The focus shifts from a single model output to a multi‑step, environment‑aware problem‑solving process.
Five‑Layer Evolution Theory of Large Models
The evolution of large‑model systems follows a clear path:
Chatbot → Reasoner → Agent → Innovator → OrganizerChatbot : text prompting and dialogue history; strong language understanding/generation.
Reasoner : external knowledge, working memory, long‑chain reasoning; improved problem‑solving.
Agent : ability to act in environments, receive feedback, invoke tools, and adjust strategies across multiple steps.
Innovator : proactive strategy construction and deep self‑correction.
Organizer : cross‑task and cross‑environment orchestration.
The driving force is deeper contextual processing: expanding sources of context, shortening feedback loops, and more coherent state updates, moving the system toward true agents.
Contextual Cognition Perspective
Agent capability is modeled as a dynamic integration of internal state I_t and external observation O_t: C_t = I_t ⊕ O_t Internal state holds working memory, sub‑goals, process logs, and reflection signals. External observation includes user commands, tool feedback, environmental changes, execution logs, and UI cues. Continuous alignment of these components enables a usable context.
Four‑Loop Closed‑Loop Framework
A unified framework consists of four stages that form a continuous loop:
Contextual Encoding : organize text, structured data, vector memories, and event records into a usable context representation.
Contextual Perception : detect key state changes in the environment and translate raw signals into semantic content.
Contextual Interaction : enable communication with users, tools, environments, or other agents, feeding feedback back into the system.
Contextual Reasoning : evaluate tasks, plan paths, select actions, and perform corrections based on the accumulated context.
When these stages form a seamless loop, the agent achieves stable task progression. Successful tool calls illustrate observation‑update‑decision cycles, while smooth long‑chain execution relies on encoding, interaction, and reasoning supporting each other on the same trajectory.
Representative Agent: OpenClaw
OpenClaw exemplifies the contextual cognition framework. It processes dynamic multi‑turn messages, user preferences, skill plugins, task context, and execution logs, continuously integrating them into an internal state. Rather than a single‑shot response, OpenClaw repeatedly performs contextual encoding, perception, interaction, and reasoning, updating its context with new feedback and adjusting subsequent actions. This demonstrates a practical personal‑assistant prototype that maintains a closed‑loop operation.
From Applications to Methods
Current agents are already deployed in deep research, coding, GUI, and scientific domains, handling complete workflows. The community’s focus is shifting to three challenges: building robust contexts, maintaining closed‑loop operation, and improving task success rates. The survey provides a systematic technical overview and a forward‑looking theoretical proposal for next‑generation intelligent systems that operate in open environments, solve long‑horizon tasks, and perform continuous inference.
Article Link
Preprint: https://www.preprints.org/manuscript/202604.0935
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
