How LoongFlow Empowers Expert‑Level AI Agents to Surpass Human Mathematicians
LoongFlow is an open‑source AI agent framework that combines a Plan‑Execute‑Summarize (PES) paradigm with a Hybrid Evolutionary Memory system to enable agents to perform long‑range, complex reasoning, achieving record‑breaking results on mathematical challenges and real‑world ML benchmarks while dramatically improving efficiency.
Background
LoongFlow is an open‑source framework for building expert‑level agents that combine long‑term population‑based optimization with deep causal reasoning.
Agent evolution stages
Stage 1 – Single‑step reasoning agents follow the ReAct loop reason → act → observe for well‑defined tasks.
Stage 2 – Evolutionary agents maintain a population of solutions and iteratively improve them via LLM‑driven evaluation, selection and mutation. Early versions treated the LLM as a random mutator, limiting efficiency.
LoongFlow architecture
The core innovations are the Plan‑Execute‑Summarize (PES) cycle and a Hybrid Evolutionary Memory system.
PES cycle
Plan : the planner analyses current solutions, retrieves relevant experiences from memory, and generates a risk‑aware evolution blueprint.
Execute : an executor adapts tools to the task (e.g., logical verifier for proofs, interactive interpreter for code, query generator for data) and performs fast local validation.
Summarize : the summarizer compares the outcome with the plan, extracts causal insights and stores structured knowledge for future iterations.
Hybrid Evolutionary Memory
Memory is organized into multiple “islands” that explore in parallel. Solutions are archived with MAP‑Elites based on diverse characteristics, and an adaptive Boltzmann selector dynamically balances exploration versus exploitation.
Performance results
On 11 benchmark mathematical problems LoongFlow surpassed the best known human results.
On 7 problems it outperformed Google AlphaEvolve, establishing a new state‑of‑the‑art.
In the MLE‑bench simulated Kaggle competition the LoongFlow‑driven ML agent earned 23 gold medals across tasks such as pathology slide cancer detection and volcano eruption prediction.
Compared with OpenEvolve and ShinkaEvolve, LoongFlow achieved >60 % higher evolutionary efficiency and a 100 % iteration success rate, reducing ineffective exploration by over 60 %.
Repository and technical report
Source code and documentation are available at https://github.com/baidu-baige/LoongFlow. A detailed technical report is hosted on arXiv: https://arxiv.org/abs/2512.24077.
Baidu Intelligent Cloud Tech Hub
We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
