MetaAgent Auto‑Evolves SOTA Memory Modules Without Hyperparameter Tuning

The article explains how the ALMA system lets a meta‑agent automatically generate and evolve Python memory modules for agents, replacing brittle handcrafted heuristics with a four‑stage meta‑learning loop, and shows that the resulting designs outperform existing baselines while using far fewer tokens.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
MetaAgent Auto‑Evolves SOTA Memory Modules Without Hyperparameter Tuning

Background

Memory is a persistent pain point in agent development because large language models are stateless during inference, limiting an agent's ability to accumulate experience.

Current industry solutions such as Retrieval‑Augmented Generation (RAG) or sliding‑window summarization rely on handcrafted heuristics, which are brittle and often fail in long‑planning tasks.

Prior Work

ADAS

Establishes code as the search space for designing agent architectures (cite "Automated Design of Agentic Systems", arXiv:2408.08435).

DGM

Introduces open‑ended evolution via an archive that stores design candidates and encourages novelty (cite "Darwin Gödel Machine: Open‑Ended Evolution of Self‑Improving Agents", arXiv:2505.22954).

image
image

ALMA: Automated Meta‑Learning of Memory Designs

ALMA combines ADAS’s code‑generation paradigm with DGM’s evolutionary strategy to focus on the memory component of agents.

The meta‑agent does not solve the task directly; it writes Python code that defines a memory module (subclass Sub_memo_layer) and then evaluates it.

The meta‑learning loop consists of four stages:

Conceive : Analyze the archive of past memory designs and propose improvements.

Plan : Translate the idea into pseudo‑code.

Implement : Generate executable Python code that overrides __init__, update, and retrieve methods of Sub_memo_layer.

Evaluate : Run the generated module in a sandbox environment, collect performance metrics, and feed them back to the archive.

The three core functions are: __init__: Define data structures such as lists, dictionaries, or graphs. update: Specify how information is compressed or discarded. retrieve: Define the retrieval logic based on the current observation.

image
image

Evolved Memory Structures

ALMA discovers task‑specific memory architectures:

Risk and Interaction for MiniHack (dungeon exploration) – records damaging actions and monster aggressiveness with high retrieval priority.

Strategy Library for Baba Is AI (logic puzzles) – stores rule combinations needed to solve a level rather than step‑by‑step actions.

These results show that the AI can identify salient task characteristics (survival vs. rule abstraction).

image
image

Experimental Evaluation

Benchmarks were run on TextWorld, ALFWorld, MiniHack, and Baba Is AI, comparing ALMA against G‑Memory, ReasoningBank, and Trajectory Retrieval.

Performance

On the GPT‑5‑mini model, ALMA achieves an average success rate of 53.9 %, higher than G‑Memory (46.0 %) and Trajectory Retrieval (48.6 %). The advantage is especially pronounced on the long‑planning ALFWorld tasks.

Cross‑Model Transfer

Memory code evolved on the smaller GPT‑5‑nano model transfers directly to GPT‑5‑mini with retained performance gains, indicating that ALMA learns model‑agnostic memory logic.

Cost Efficiency

ALMA consumes on average 1,319 tokens per run, versus 9,149 tokens for Trajectory Retrieval and 6,055 tokens for G‑Memory, delivering comparable or better performance with roughly one‑seventh to one‑fifth the token budget.

image
image

Conclusion

ALMA demonstrates a transition from “Software 2.0” (neural networks) to “Software 3.0” (AI‑generated algorithms) by automatically designing high‑quality memory modules for agents, suggesting a path toward more general, self‑improving agents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Code GenerationBenchmarkMeta LearningAgent MemoryALMAOpen-Ended Evolution
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.