How LoongFlow Empowers Expert‑Level AI Agents to Surpass Human Mathematicians

LoongFlow is an open‑source AI agent framework that combines a Plan‑Execute‑Summarize (PES) paradigm with a Hybrid Evolutionary Memory system to enable agents to perform long‑range, complex reasoning, achieving record‑breaking results on mathematical challenges and real‑world ML benchmarks while dramatically improving efficiency.

Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
How LoongFlow Empowers Expert‑Level AI Agents to Surpass Human Mathematicians

Background

LoongFlow is an open‑source framework for building expert‑level agents that combine long‑term population‑based optimization with deep causal reasoning.

Agent evolution stages

Stage 1 – Single‑step reasoning agents follow the ReAct loop reason → act → observe for well‑defined tasks.

Stage 2 – Evolutionary agents maintain a population of solutions and iteratively improve them via LLM‑driven evaluation, selection and mutation. Early versions treated the LLM as a random mutator, limiting efficiency.

LoongFlow architecture

The core innovations are the Plan‑Execute‑Summarize (PES) cycle and a Hybrid Evolutionary Memory system.

PES cycle

Plan : the planner analyses current solutions, retrieves relevant experiences from memory, and generates a risk‑aware evolution blueprint.

Execute : an executor adapts tools to the task (e.g., logical verifier for proofs, interactive interpreter for code, query generator for data) and performs fast local validation.

Summarize : the summarizer compares the outcome with the plan, extracts causal insights and stores structured knowledge for future iterations.

Hybrid Evolutionary Memory

Memory is organized into multiple “islands” that explore in parallel. Solutions are archived with MAP‑Elites based on diverse characteristics, and an adaptive Boltzmann selector dynamically balances exploration versus exploitation.

Performance results

On 11 benchmark mathematical problems LoongFlow surpassed the best known human results.

On 7 problems it outperformed Google AlphaEvolve, establishing a new state‑of‑the‑art.

In the MLE‑bench simulated Kaggle competition the LoongFlow‑driven ML agent earned 23 gold medals across tasks such as pathology slide cancer detection and volcano eruption prediction.

Compared with OpenEvolve and ShinkaEvolve, LoongFlow achieved >60 % higher evolutionary efficiency and a 100 % iteration success rate, reducing ineffective exploration by over 60 %.

Repository and technical report

Source code and documentation are available at https://github.com/baidu-baige/LoongFlow. A detailed technical report is hosted on arXiv: https://arxiv.org/abs/2512.24077.

evolutionary optimizationexpert reasoningLoongFlowmachine learning benchmarksPES paradigm
Baidu Intelligent Cloud Tech Hub
Written by

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.