A Comprehensive Survey of Self-Evolving Agents: From Model-Centric to Environment-Driven Co-Evolution
This survey systematically reviews self‑evolving agents, explains why autonomous agents are needed, proposes a unified taxonomy of three evolution paradigms, analyzes model‑centric, environment‑centric, and co‑evolution approaches, and outlines future challenges in designing adaptive environments.
1. Why Self‑Evolving Agents?
Traditional agent systems follow a two‑stage paradigm: Pre‑Training on large corpora to acquire general knowledge, then Post‑Training (SFT, RLHF, RLAIF) to learn specific agentic abilities. While this has propelled LLM agents, it creates a bottleneck because complex agents increasingly depend on high‑quality human supervision, which is costly and hard to scale.
When agents evolve from passive recipients of human labels to active explorers that generate problems, seek feedback, and refine strategies, we must rethink what “self‑evolution” means.
The core motivation is to shift agents from passive, human‑supervised learning to autonomous problem construction, environment interaction, feedback generation, and continuous policy improvement.
2. Unified Taxonomy: Three Self‑Evolution Routes
The survey introduces a taxonomy that classifies self‑evolving agents based on where evolution occurs:
Model‑Centric Self‑Evolution : evolution happens inside the model.
Environment‑Centric Self‑Evolution : evolution is driven by interaction with external knowledge, tools, or experiences.
Model‑Environment Co‑Evolution : both model and environment evolve together, influencing each other.
This framework organizes the field by “where evolution occurs” rather than by task type.
3. Model‑Centric Self‑Evolution: Strengthening the Model Itself
Model‑centric methods assume the model already contains latent capabilities that can be unlocked.
3.1 Inference‑Based Evolution (Test‑time Computation)
Parallel Sampling : generate multiple reasoning paths and select the best via voting or ranking.
Sequential Self‑Correction : generate, reflect, and iteratively correct outputs.
Structured Reasoning : organize reasoning as trees or graphs.
More test‑time compute is exchanged for more reliable single‑shot outputs.
These improvements are temporary because model parameters remain unchanged after inference.
3.2 Training‑Based Evolution (Parameter Updates)
Training‑based routes aim for lasting capability gains by generating data, filtering it, and updating the model via SFT or RL.
Synthesis‑Driven Offline Self‑Evolving : generate synthetic data offline and use it for training.
Exploration‑Driven Online Self‑Evolving : continuously explore, receive real‑time feedback, and update policies.
Offline synthesis is efficient but limited by the initial model’s ability; online exploration can discover new strategies but demands high‑quality feedback and stable training.
Recent works such as R‑Zero, Absolute Zero, and Agent0 exemplify attempts to let models self‑play, receive environment feedback, and acquire new training signals.
4. Environment‑Centric Self‑Evolution: Leveraging External Sources
Environment‑centric methods emphasize that an agent’s evolution also depends on how it utilizes external knowledge, experience, tools, and multi‑agent structures.
Static Knowledge Evolution : agents actively query and retrieve information rather than passively receiving it.
Dynamic Experience Evolution : agents extract reusable “how‑to” knowledge from past trajectories, error recovery patterns, and workflow logs.
Modular Architecture Evolution : modules such as memory, tools, interfaces, protocols, and skill libraries evolve (e.g., memory that can decide to forget or merge).
Agentic Topology Evolution : the communication structure, role allocation, and team size of multi‑agent systems adapt automatically.
4.1 Static Knowledge Evolution
Beyond traditional RAG, Agentic RAG and Deep Research let agents identify knowledge gaps, generate queries, browse the web, collect evidence, and produce structured reports, making retrieval an active reasoning step.
4.2 Dynamic Experience Evolution
Agents must answer “how to do” questions by learning from successful tool‑call sequences, error‑recovery strategies, historical failures, and reusable workflows.
4.3 Modular Architecture Evolution
Memory modules can become selective databases that decide what to retain or discard; tool modules can be created or composed by the agent; interaction interfaces can be optimized for model comprehension.
4.4 Agentic Topology Evolution
Instead of fixed pipelines (planner → executor → critic), research explores automatically searching or adjusting multi‑agent communication graphs, role assignments, and collaboration topologies.
5. Model‑Environment Co‑Evolution: The Future Direction
Both model‑centric and environment‑centric routes have limitations: model‑centric approaches lack external validation and risk hallucination; environment‑centric methods often rely on static or single‑task environments.
The ideal is a dynamic environment that adapts its difficulty to the agent’s ability, providing targeted, verifiable feedback for open‑ended, long‑term exploration.
5.1 Multi‑Agent Policy Co‑Evolution
In multi‑agent settings, the environment can be composed of other agents that provide peer evaluation, collaborative reinforcement learning, and dynamic learning curricula.
5.2 Environment Training
An ideal trainable environment should:
Offer verifiable feedback.
Adjust difficulty automatically based on agent capability.
Generate diverse tasks.
Support long‑term, open‑ended exploration.
Projects such as Reasoning Gym, AgentGym, and Agent‑World are moving toward this vision.
The core challenge for future self‑evolving agents is not just training stronger models but designing environments that grow together with the agents.
Survey: A Systematic Survey of Self-Evolving Agents: From Model-Centric to Environment-Driven Co-Evolution
https://www.techrxiv.org/doi/full/10.36227/techrxiv.177203250.05832634/v2
GitHub: https://github.com/XMUDeepLIT/Awesome-Self-Evolving-AgentsSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
