Artificial Intelligence 13 min read

How MiniMax M2.7 Achieves SOTA Agent Performance Through Self‑Evolving Loops

MiniMax M2.7 is a self‑evolving LLM that combines a persistent Agent Harness, multi‑level memory, and autonomous improvement cycles to reach SOTA benchmark scores, cost efficiency, and real‑world software‑engineering capabilities, illustrating the emerging skill‑economy of agent ecosystems.

Code Mala Tang

Mar 28, 2026

How MiniMax M2.7 Achieves SOTA Agent Performance Through Self‑Evolving Loops

Overview

MiniMax M2.7 is a large language model that incorporates a persistent‑state agent framework (Agent Harness) and executes an autonomous recursive self‑improvement (RSI) loop. During development it completed more than 100 refinement cycles without human intervention, achieving measurable gains on software‑engineering benchmarks.

Agent Harness Architecture

The Harness surrounds the model and provides all runtime services except raw token generation. Its main components are:

Tool Integration Layer : callable primitives for file I/O, code execution, database queries, API calls, and network access.

Memory and State Management : short‑term working context, session‑level persistent logs, and long‑term structured knowledge.

Context Engineering and Compression : selects which information to inject into each model call, compresses older history, and applies retrieval‑augmented generation (RAG) patterns.

Verification and Guardrails : runs unit‑style tests, validates outputs, and can require human review for sensitive actions.

┌─────────────────────────────────────────┐
│            Agent Harness                │
│  ┌───────┐  ┌────────┐  ┌───────────┐   │
│  │ Tools │  │ Memory │  │ Verifier  │   │
│  └───┬───┘  └───┬────┘  └─────┬─────┘   │
│      │          │             │         │
│      └──────────┼─────────────┘         │
│                 │                       │
│           ┌─────┴─────┐                 │
│           │   Model   │                 │
│           └───────────┘                 │
└─────────────────────────────────────────┘

Recursive Self‑Improvement Loop

The model instantiated an internal “research‑agent suite” that generated and refined its own training framework. Each iteration followed a deterministic pipeline:

Analyze failure trajectories
→ Plan modifications
→ Update framework code
→ Run evaluation
→ Compare results
→ Commit change if improvement
→ Otherwise revert

Over 100 autonomous cycles the system discovered optimisations such as systematic sampling‑parameter tuning (temperature, frequency penalty, presence penalty), workflow‑specific heuristics (e.g., auto‑search for similar bug patterns), and loop‑detection break‑points. These changes yielded roughly a 30 % improvement on internal evaluation sets.

Memory System

M2.7 maintains three hierarchical memory tiers, analogous to CPU cache levels:

Short‑term (working context) : immediate prompt and recent dialogue.

Session state (contextual memory) : a persistent markdown log per task; after each iteration the model writes a markdown file and performs a self‑critique.

Long‑term (persistent knowledge) : cross‑task knowledge stored as structured JSON, enabling reliable retrieval across sessions.

JSON is preferred for state files because it resists accidental reformatting during model‑generated edits.

Benchmark Performance

On the SWE‑Pro benchmark (multi‑language software‑engineering tasks) M2.7 scores 56.22 % , matching GPT‑5.3‑Codex. Additional results:

Terminal Bench 2: 57.0 %

VIBE‑Pro (end‑to‑end project delivery): 55.6 % , comparable to Opus 4.6.

In simulated production‑incident scenarios the model reduces mean‑time‑to‑recovery (MTTR) to under three minutes by automatically correlating metrics, performing causal inference, querying databases, and applying non‑blocking index fixes.

Multi‑Agent Coordination (Agent Teams)

M2.7 includes a native “Agent Team” capability where multiple agents retain stable role identities, can challenge each other’s reasoning, and make autonomous decisions within complex state machines. These behaviours are internalised as native abilities: role boundaries, adversarial reasoning, protocol compliance, and behaviour differentiation.

Skill Economy and Architecture Flow

The OpenClaw platform treats agent capabilities as “skills” – self‑contained definitions (~2 000 tokens each) that can be discovered, invoked, and composed. The runtime flow is:

User → Gateway (WebSocket) → Brain (model + framework) → Skills (callable abilities)

The gateway aggregates inputs from various channels (e.g., WhatsApp, Telegram, Slack, Discord, web) and routes them to the appropriate skill.

Conclusions

MiniMax M2.7 demonstrates that a large language model equipped with a robust agent harness can autonomously iterate on its own training pipeline, achieve competitive benchmark performance, and perform real‑world incident remediation. The architecture shows a path toward higher‑level self‑improving systems where software engineers focus on designing the improvement loops rather than manually coding each iteration.

Artificial Intelligence large language models benchmarking Agent architecture Self-Improving Models

Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.