Can LLM Agents Self‑Evolve Without Retraining? Inside Memento‑Skills

The article analyzes the Memento‑Skills framework, which treats external memory as executable skills to enable deployment‑time continual learning for frozen LLM agents, detailing its read‑write reflective loop, skill‑as‑memory design, behavior‑trained skill router, experimental validation on GAIA and HLE benchmarks, and theoretical guarantees without gradient updates.

PaperAgent
PaperAgent
PaperAgent
Can LLM Agents Self‑Evolve Without Retraining? Inside Memento‑Skills

Self‑Evolving LLM Agents: Deployment‑time Learning

The core insight of Memento‑Skills is that when the model parameters \(\theta\) are fixed, all adaptation must come from the input—prompts, context, or an external memory . This mirrors how human experts accumulate reusable skill memories without rewiring their brains.

Read‑Write Reflective Learning Loop

The system operates in a closed Read‑Write Reflective Learning cycle:

Read : Retrieve the most relevant skill from a skill library based on the current task state.

Act : The frozen LLM executes a multi‑step workflow using the retrieved skill.

Write : The system reflects on execution feedback and updates the skill library—optimizing existing skills or creating new ones.

This process is formalized as a Stateful Reflective Decision Process (SRDP) , extending the standard MDP by adding an episodic memory that grows over time while preserving the Markov property.

Skill‑as‑Memory: Executable Structured Knowledge

Each skill is stored as a structured folder containing:

SKILL.md : Declarative specification (purpose, usage conditions, parameter description).

Executable Code : The script that implements the skill.

Prompt Template : Guidance for the LLM on how to invoke the skill.

Key properties:

Skills are executable programs, not static text.

Automatic optimization via failure attribution rewrites faulty skills.

Skill discovery creates or refactors skills when utility scores fall below a threshold.

Behavior‑Trained Skill Router

Traditional semantic similarity does not guarantee execution utility. Memento‑Skills trains a Behaviour‑Trainable Skill Router using offline single‑step reinforcement learning with an InfoNCE loss, optimizing retrieval for behavioral similarity rather than surface semantics.

Experimental Validation

1. GAIA (General AI Assistant Benchmark)

Training success rate improved from 65.1% on the first attempt to 91.6% after three reflection rounds.

Test accuracy reached 66.0%, a 13.7‑point gain over the Read‑Write baseline (52.3%).

2. Humanity's Last Exam (HLE)

Training accuracy rose from 30.8% (R0) to 54.5% (R3).

Test accuracy achieved 38.7%, a 116.2% relative improvement over the baseline (17.9%).

Analysis shows that cross‑task transfer is strongest in structured domains where the skill library aligns with the subject matter.

3. Skill Library Evolution Visualization

t‑SNE projections reveal that after GAIA training the library expands from 5 atomic skills to 41 clustered skills, while after HLE it grows to 235 skills covering diverse semantic neighborhoods (search/web, quantum physics, math, chemistry, etc.).

Theoretical Perspective: Three Independent Optimization Knobs

Stronger LLM : Reduces local approximation error.

More Experience Rounds : Expands memory coverage, lowering both generalization and retrieval errors.

Better Embedding/Retrieval : Directly reduces retrieval error.

These knobs allow modular upgrades without retraining the entire system.

Gradient‑Free Continual Learning

Zero‑training‑cost adaptation: No back‑propagation or parameter updates after deployment.

Explainable skill evolution: Skills remain human‑readable and auditable.

Theoretical convergence guarantees under the SRDP framework.

The overall picture suggests a shift from static parameter knowledge to procedural memory —executable, evolvable, and inheritable skills.

https://arxiv.org/pdf/2603.18743
Memento‑Skills: Let Agents Design Agents
https://github.com/Memento-Teams/Memento-Skills
Figure 1: Overview of LLM adaptation paradigms
Figure 1: Overview of LLM adaptation paradigms
AILLMAgentMemoryreinforcement learningContinual LearningSkill Retrieval
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.