Designing Autonomous LLM Agents: Architecture, Memory, Planning, and Learning Strategies
This article surveys the design of autonomous large‑language‑model agents, detailing their modular architecture—including profiling, memory, planning, and execution—while also reviewing common profiling methods, memory structures, planning techniques, action strategies, and various learning approaches such as exemplar, human‑in‑the‑loop, and environment‑feedback training.
LLM‑based autonomous agents aim to perform diverse tasks with human‑like abilities. Achieving this requires two key aspects: designing effective architectures and learning their parameters.
1. Agent Architecture Design
The proposed unified framework consists of four modules: a profiling module, a memory module, a planning module, and an execution module.
Profiling identifies the agent’s role.
Memory and planning place the agent in dynamic environments, enabling recall of past actions and future planning.
Execution turns decisions into concrete outputs.
The profiling module can be built via three strategies:
Manual creation (e.g., specifying personality traits).
LLM‑generated profiles (using few‑shot prompts to seed additional profiles).
Dataset‑aligned profiles (leveraging real‑world demographic data).
1.2 Memory Module
Memory stores perceived information and supports future actions. It draws inspiration from human memory, featuring short‑term (context window) and long‑term (external vector store) components.
Two common structures:
Unified memory : a single store accessed for reading, writing, and reflection (e.g., Atlas, Augmented‑LLM, Voyager, ChatLog).
Hybrid memory : distinct short‑term and long‑term stores (e.g., works that separate recent experiences from consolidated knowledge).
Four typical storage formats are used:
Natural language
Embeddings
Databases
Structured lists
Memory operations include reading (scoring recentness, relevance, importance), writing (handling duplication and overflow), and reflection (self‑summarization, validation, correction, empathy).
1.3 Planning Module
Planning decomposes complex tasks into sub‑tasks. Approaches include:
No‑feedback planning : sub‑goal decomposition (Chain‑of‑Thought, Zero‑shot‑CoT), multi‑path thinking (CoT‑SC, ToT), and external planners (LLM+P, LLM‑DP, MRKL, CO‑LLM).
Feedback‑based planning : incorporating environment feedback (ReAct, Voyager, Ghost), human feedback (OpenAGI, interactive RL), and model feedback (Self‑Refine, Reflexion, RAP, REX, MAD).
1.4 Action Module
Actions translate plans into outcomes and interact with the environment. Goals include task completion, dialogue interaction, and environment exploration. Strategies involve memory recall, multi‑turn interaction, feedback adjustment, and tool integration (APIs, knowledge bases, language models, vision models).
Action space is expanded by external tools (search engines, databases, APIs) and the agent’s own knowledge (language generation, memory‑driven decisions). Impacts cover environment changes, internal state updates, triggering new actions, and influencing human perception.
2. Learning Strategies
Learning enhances agent capabilities. Major strategies are:
Exemplar learning : fine‑tuning on human‑annotated data (CoH, MIND2WEB) or LLM‑generated annotations (ToolFormer, ToolBench).
Environment feedback : agents explore and adapt via reinforcement signals (Voyager, LMA3, GITM, WebShop, embodiment simulators).
Interactive human feedback : iterative human‑in‑the‑loop refinement (e.g., collaborative chat‑based feedback loops).
The article concludes with a table mapping prior works to the presented taxonomy and provides references.
Architect's Alchemy Furnace
A comprehensive platform that combines Java development and architecture design, guaranteeing 100% original content. We explore the essence and philosophy of architecture and provide professional technical articles for aspiring architects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
