How a 4B Model Beats 30B Giants: Inside AgentCPM-Explore’s SOTA Performance
AgentCPM-Explore, a 4‑billion‑parameter open‑source model, achieves state‑of‑the‑art results on long‑range exploration tasks, matching or surpassing larger 8B and even 30B models, thanks to a full‑stack infrastructure, novel training tricks, and extensive benchmark evaluations across eight agent‑centric datasets.
When the industry debates whether 30B‑parameter models can challenge trillion‑parameter systems, the collaborative effort of Tsinghua NLP Lab, Renmin University, Mianbi AI, and the OpenBMB community presents a more aggressive answer: a 4B‑parameter agent model, AgentCPM‑Explore.
AgentCPM‑Explore Core Highlights
Breaks the parameter barrier : First 4B model supporting eight long‑term agent tasks (GAIA, Xbench, BrowserComp, etc.), redefining the performance ceiling of small models.
Deep long‑term exploration : Capable of over 100 stable interaction rounds without repetition, continuously exploring until task success.
Full‑process open‑source stack : Includes AgentDock (tool sandbox), AgentRL (asynchronous RL framework), and AgentToLeaP (one‑click evaluation platform) for reproducible research and custom extensions.
Benchmark Performance
AgentCPM‑Explore attains SOTA results on eight mainstream agent benchmarks (GAIA, HLE, BrowserComp, WebWalker, FRAMES, Xbench‑DeepResearch, Seal‑0). It not only matches same‑size SOTA models but also surpasses 8B models and rivals some 30B closed‑source systems, demonstrating an exceptional parameter‑efficiency ratio.
On the Xbench‑DeepResearch dataset, AgentCPM‑Explore outperforms OpenAI‑o3 and Claude‑4.5‑Sonnet, breaking the performance trend line of larger models. Detailed charts (see Fig. 1) show the model’s stability with an Avg@8 metric that keeps variance under 2%.
Full‑stack Open‑source Infrastructure
The project releases not only the model weights but also three key components:
AgentDock : A unified tool‑sandbox managing 16 MCP services and hundreds of tools, supporting >100 QPS high‑concurrency calls, robust fault‑tolerance, and dynamic routing.
AgentRL : A minimalist asynchronous reinforcement‑learning framework. It requires only a standard ChatCompletions API for training, contains fewer than 7 source files (~1 k lines of code), and enables fully asynchronous training‑inference pipelines on a single GPU.
AgentToLeaP : One‑click evaluation platform covering eight leaderboards (GAIA, HLE, etc.). Users can launch full evaluations with a single command and extend the suite with custom test sets.
Key Challenges for 4B Models and Solutions
1. Model fusion to avoid SFT over‑fitting : Small models tend to memorize task‑specific patterns during supervised fine‑tuning. By linearly blending the fine‑tuned “specialized” checkpoint with the pre‑training “general” checkpoint, generic parameters are preserved while specialized abilities are amplified, yielding ~7% performance gains on agent tasks.
2. Reward‑signal denoising for RL : Long‑horizon tasks generate noisy penalties that can corrupt correct intermediate reasoning. The team filters trajectories, applying penalties only to steps that truly affect policy improvement, preventing “mis‑killing” of valid reasoning steps.
3. Information refinement to combat noisy web context : Excessive web text degrades small‑model reasoning. An auxiliary summarization module preprocesses web content, feeding only distilled information to the 4B model, which improves GAIA performance by up to 10% compared with raw context.
Community Invitation
The open‑source platform invites researchers to validate new ideas, engineers to optimize training/inference efficiency, and evaluators to design challenging test cases, fostering a collaborative ecosystem for next‑generation edge agents.
For code and model downloads see:
GitHub: https://github.com/OpenBMB/AgentCPM
HuggingFace: https://huggingface.co/openbmb/AgentCPM-Explore
ModelScope: https://modelscope.cn/models/OpenBMB/AgentCPM-Explore
GitCode: https://gitcode.com/OpenBMB/AgentCPM
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
