How a 4B Model Beats 30B Giants: Inside AgentCPM-Explore’s SOTA Performance

AgentCPM-Explore, a 4‑billion‑parameter open‑source model, achieves state‑of‑the‑art results on long‑range exploration tasks, matching or surpassing larger 8B and even 30B models, thanks to a full‑stack infrastructure, novel training tricks, and extensive benchmark evaluations across eight agent‑centric datasets.

PaperAgent
PaperAgent
PaperAgent
How a 4B Model Beats 30B Giants: Inside AgentCPM-Explore’s SOTA Performance

When the industry debates whether 30B‑parameter models can challenge trillion‑parameter systems, the collaborative effort of Tsinghua NLP Lab, Renmin University, Mianbi AI, and the OpenBMB community presents a more aggressive answer: a 4B‑parameter agent model, AgentCPM‑Explore.

AgentCPM‑Explore Core Highlights

Breaks the parameter barrier : First 4B model supporting eight long‑term agent tasks (GAIA, Xbench, BrowserComp, etc.), redefining the performance ceiling of small models.

Deep long‑term exploration : Capable of over 100 stable interaction rounds without repetition, continuously exploring until task success.

Full‑process open‑source stack : Includes AgentDock (tool sandbox), AgentRL (asynchronous RL framework), and AgentToLeaP (one‑click evaluation platform) for reproducible research and custom extensions.

Benchmark Performance

AgentCPM‑Explore attains SOTA results on eight mainstream agent benchmarks (GAIA, HLE, BrowserComp, WebWalker, FRAMES, Xbench‑DeepResearch, Seal‑0). It not only matches same‑size SOTA models but also surpasses 8B models and rivals some 30B closed‑source systems, demonstrating an exceptional parameter‑efficiency ratio.

On the Xbench‑DeepResearch dataset, AgentCPM‑Explore outperforms OpenAI‑o3 and Claude‑4.5‑Sonnet, breaking the performance trend line of larger models. Detailed charts (see Fig. 1) show the model’s stability with an Avg@8 metric that keeps variance under 2%.

Full‑stack Open‑source Infrastructure

The project releases not only the model weights but also three key components:

AgentDock : A unified tool‑sandbox managing 16 MCP services and hundreds of tools, supporting >100 QPS high‑concurrency calls, robust fault‑tolerance, and dynamic routing.

AgentRL : A minimalist asynchronous reinforcement‑learning framework. It requires only a standard ChatCompletions API for training, contains fewer than 7 source files (~1 k lines of code), and enables fully asynchronous training‑inference pipelines on a single GPU.

AgentToLeaP : One‑click evaluation platform covering eight leaderboards (GAIA, HLE, etc.). Users can launch full evaluations with a single command and extend the suite with custom test sets.

Key Challenges for 4B Models and Solutions

1. Model fusion to avoid SFT over‑fitting : Small models tend to memorize task‑specific patterns during supervised fine‑tuning. By linearly blending the fine‑tuned “specialized” checkpoint with the pre‑training “general” checkpoint, generic parameters are preserved while specialized abilities are amplified, yielding ~7% performance gains on agent tasks.

2. Reward‑signal denoising for RL : Long‑horizon tasks generate noisy penalties that can corrupt correct intermediate reasoning. The team filters trajectories, applying penalties only to steps that truly affect policy improvement, preventing “mis‑killing” of valid reasoning steps.

3. Information refinement to combat noisy web context : Excessive web text degrades small‑model reasoning. An auxiliary summarization module preprocesses web content, feeding only distilled information to the 4B model, which improves GAIA performance by up to 10% compared with raw context.

Community Invitation

The open‑source platform invites researchers to validate new ideas, engineers to optimize training/inference efficiency, and evaluators to design challenging test cases, fostering a collaborative ecosystem for next‑generation edge agents.

AgentCPM-Explore core highlights diagram
AgentCPM-Explore core highlights diagram
Performance comparison chart
Performance comparison chart

For code and model downloads see:

GitHub: https://github.com/OpenBMB/AgentCPM

HuggingFace: https://huggingface.co/openbmb/AgentCPM-Explore

ModelScope: https://modelscope.cn/models/OpenBMB/AgentCPM-Explore

GitCode: https://gitcode.com/OpenBMB/AgentCPM

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Agentopen sourcelarge language modelBenchmarkreinforcement learningAgentCPM-Explore
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.