Artificial Intelligence 11 min read

Autogenesis: A Self‑Evolving Agent OS That Drives Near‑Perfect C++ LeetCode Scores

The paper introduces the Autogenesis Protocol (AGP), a two‑layer resource‑governed framework that lets agents safely modify their own prompts, tools, memory and environment, and demonstrates its effectiveness with the AGS system achieving 93.33% GAIA validation accuracy and near‑full scores on C++ LeetCode problems.

Machine Heart

May 30, 2026

Autogenesis: A Self‑Evolving Agent OS That Drives Near‑Perfect C++ LeetCode Scores

Background and Motivation

Large‑model agents have progressed from tool use to web browsing and multi‑agent collaboration, but they remain tightly coupled, lack lifecycle and version management, and rely on ad‑hoc, experience‑driven self‑improvement that is hard to audit or roll back.

Autogenesis Protocol (AGP)

AGP is a dual‑layer protocol that separates what can evolve from how it evolves :

Resource Substrate Protocol Layer (RSPL) defines which internal resources—Prompt, Agent, Tool/MCP/Skill, Environment, Memory—are eligible for evolution and provides registration, retrieval, versioning, rollback and audit interfaces.

Self‑Evolution Protocol Layer (SEPL) governs the safe evolution process, formalizing it as a closed‑loop workflow: Reflect → Select → Improve → Evaluate → Commit. All changes are applied via RSPL’s versioned interfaces rather than direct code patches.

RSPL treats resources as passive entities; any state change must pass through SEPL, preventing uncontrolled black‑box modifications.

Resource Substrate Protocol Layer (RSPL)

RSPL abstracts the five core agent components—Prompt, Agent, Tool/MCP/Skill, Environment, Memory—into protocol‑level resources. Each resource is registered with explicit state, lifecycle, version interfaces and an evolvable flag. The layer offers uniform context managers and service APIs for registration, invocation, version management, rollback, contract generation, and execution tracing.

Because resources are passive, all modifications are forced through SEPL, ensuring evolutions are auditable and reversible.

Self‑Evolution Protocol Layer (SEPL)

SEPL defines the standards for safe evolution. The self‑evolution loop consists of five steps:

Reflect – analyze current performance and failures.

Select – choose candidate resources for improvement.

Improve – generate modifications using any optimization strategy (e.g., Reflection Optimizer, TextGrad, Reinforce++, GRPO).

Evaluate – test the modified resources against validation criteria.

Commit – apply changes through RSPL’s versioned interfaces.

Each iteration produces a traceable, versioned operation rather than an ad‑hoc patch.

Autogenesis System (AGS)

Built on AGP, AGS is a multi‑agent platform where a Planning Agent and specialized sub‑agents (Deep Researcher, Browser‑use, Deep Analyzer, Vibe Coding) register as first‑class resources. The workflow follows:

Plan the task.

Execute sub‑agents in parallel while logging trajectories.

Detect failures and trigger the SEPL self‑evolution loop.

Upon successful evolution, register new capabilities in RSPL for immediate reuse.

This design enables dynamic registration, retrieval, modification, and reuse of internal resources during task execution.

Performance Evaluation

GAIA benchmark : AGS Agent‑Evo achieved 93.33% validation accuracy and 89.04% test accuracy, a 12.61 percentage‑point gain over the vanilla baseline (79.07%). Level‑3 task accuracy rose from 61.22% to 81.63% (+33.34%).

Humanity’s Last Exam (HLE) full‑scale test : AGS ranked second with a score of 59.6%.

Mathematics and science reasoning : AGS performed well on GPQA‑Diamond, AIME‑24 and AIME‑25, demonstrating the generality of self‑evolution across reasoning domains.

Code generation benchmark : A LeetCode‑based suite of 100 recent problems in Python, C++, Java, Go and Kotlin was used. C++ and Java agents approached a perfect 100‑problem pass rate. Self‑evolution reduced compilation errors, runtime errors, timeouts and wrong‑answer rates across all languages, yielding measurable pass‑rate improvements and runtime optimizations for compiled languages.

Implications

By treating agent components as managed resources, AGP enables auditable, roll‑backable self‑evolution. The protocol‑level governance shifts agent design from “adding more tools” to “governed evolution,” providing a unified mechanism for safe, repeatable, and traceable self‑modification.

References

Paper: https://arxiv.org/abs/2604.15034

GitHub repository: https://github.com/DVampire/Autogenesis

Illustrations

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AGP LeetCode resource governance GAIA benchmark self-evolving agents agent protocols Autogenesis

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.