How LoongFlow Enables Expert‑Level AI Agents to Outperform Human Mathematicians

LoongFlow is an open‑source AI agent framework that combines a Plan‑Execute‑Summarize (PES) paradigm with a Hybrid Evolutionary Memory system, allowing agents to perform directed, iterative problem solving and achieve state‑of‑the‑art results on mathematical challenges, Kaggle‑style benchmarks, and real‑world tasks with dramatically higher efficiency.

Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
How LoongFlow Enables Expert‑Level AI Agents to Outperform Human Mathematicians

Overview

LoongFlow is an open‑source framework for building AI agents that perform expert‑level reasoning. It introduces a systematic “Plan‑Execute‑Summarize” (PES) cycle and a Hybrid Evolutionary Memory to guide directed evolution of solution populations.

Key Components

PES paradigm Each iteration consists of:

Plan : Analyze the current solution pool, retrieve relevant experience from a strategic knowledge base, and generate a concrete evolution plan.

Execute : Dynamically select tools (logical verifier, code interpreter, data‑query generator, etc.) and perform fast local validation of candidate solutions.

Summarize : Compare execution outcomes with the plan, extract causal insights, and store them back into the memory.

Hybrid Evolutionary Memory A multi‑island experience repository that archives solutions with rich metadata, supports MAP‑Elites archiving, and uses adaptive Boltzmann selection to balance exploration and exploitation.

Efficiency Mechanisms

The structured PES cycle turns random search into directed exploration, reducing wasted evaluations by roughly 60 % and achieving near‑certain convergence (iteration success rate ≈ 100 %).

Benchmark Results

Mathematical challenges : On 11 problems from the Tao‑Zhexuan/AlphaEvolve benchmark LoongFlow agents surpassed the best known human results; on 7 problems they outperformed Google AlphaEvolve, establishing new state‑of‑the‑art.

MLE‑bench (Kaggle‑style) : A machine‑learning agent built with LoongFlow earned 23 gold medals across tasks such as pathology cancer detection and volcanic eruption prediction.

Evolution efficiency : Compared with OpenEvolve and ShinkaEvolve, LoongFlow improved efficiency by > 60 % while maintaining 100 % iteration success.

Example Application

In the “circle‑packing” problem (arranging non‑overlapping circles to maximize coverage within a shape), LoongFlow discovered arrangements that were more compact than those found by human mathematicians after years of research and by the AlphaEvolve system.

Open‑Source Release

The source code, documentation, and example agents are available at https://github.com/baidu-baige/LoongFlow. A detailed technical report describing the design can be accessed at https://arxiv.org/abs/2512.24077.

benchmarkingEvolutionary Algorithmsexpert reasoningLoongFlow
Baidu Intelligent Cloud Tech Hub
Written by

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.