China’s Mysterious AI Security Team “MopMonk” Shocks the Industry with a 73% Success Rate

A previously unknown Chinese AI security group called MopMonk, operating without a website or corporate backing, posted a GitHub report that achieved a 73.1% vulnerability‑exploitation success rate, ranked seventh globally in the UC Berkeley‑run CyberGym benchmark, and demonstrated novel memory‑based multi‑agent techniques that signal China’s rising AI security prowess.

Black & White Path
Black & White Path
Black & White Path
China’s Mysterious AI Security Team “MopMonk” Shocks the Industry with a 73% Success Rate

CyberGym benchmark

CyberGym is an AI security benchmark created by a UC Berkeley team and accepted as an ICLR 2026 paper. It contains 1,507 real OSS‑Fuzz vulnerabilities. Agents must run offline, reason over millions of lines of code in large open‑source projects, and produce a proof‑of‑concept (PoC) that triggers the bug on the vulnerable commit but fails on the patched version.

Why the benchmark is hard

Agents need to locate entry points, generate valid inputs, and verify that the PoC works only on the vulnerable version, all without network access.

MopMonk architecture

1. Domestic MiniMax M3 base model

Ultra‑long context window : about one million tokens, enabling ingestion of an entire codebase.

Mixture‑of‑Experts sparse attention : maintains performance while reducing compute cost.

Strong programming ability : scores 59.0 % on SWE‑Bench Pro, 66.0 % on Terminal‑Bench 2.1, and 74.2 % on MCP Atlas.

2. Structured “Vulnerability Memory” system

The system stores key information in seven structured memory types, turning repeated trial‑and‑error into evidence‑driven convergence.

Vulnerability target memory : target vulnerability, success conditions, verification criteria.

Code‑path memory : confirmed entry points, harnesses, parsing chains, suspicious functions.

Input format memory : file format, field relationships, length constraints, boundary conditions.

Candidate PoC memory : candidate inputs, generation rationale, triggered behavior, mutation directions.

Negative evidence memory : non‑trigger attempts, unreachable paths, build failures, format errors.

Verification state memory : whether the PoC triggers a crash and failure reasons.

Next‑step constraint memory : concrete constraints that the next attempt must satisfy.

3. Multi‑agent collaborative exploration

Multiple agents share the same memory store. Each agent reads current evidence, tests hypotheses, and writes back new constraints or results. This design yields three direct effects:

Reduced duplicate work : failed paths are recorded and not retried.

Preserved negative evidence : non‑trigger attempts become constraints rather than being discarded.

Higher effective experiment density : more directions are explored within a limited budget.

Ranking results (CyberGym Level 1, 4‑hour timeout)

1 – Crystalline (Claude Opus 4.6) – 89.6 % – 2026‑06‑08

2 – MDASH (Multi‑model) – 88.4 % – 2026‑05‑12

3 – OpenAI Agent (GPT‑5.5‑Cyber) – 85.6 % – 2026‑06‑22

4 – Anthropic Agent (Claude Mythos Preview) – 83.1 % – 2026‑04‑07

5 – OpenAI Agent (GPT‑5.5) – 81.8 % – 2026‑04‑23

6 – OpenAI Agent (GPT‑5.4) – 79.0 % – 2026‑04‑23

7 – MopMonk Agent (MiniMax M3) – 73.1 % – 2026‑06‑29

Task‑time distribution and resource usage

<10 minutes: 39.95 %

10‑30 minutes: 23.95 %

30‑60 minutes: 7.76 %

1‑2 hours: 10.82 %

2‑3 hours: 0.86 %

3‑4 hours: 16.66 %

Total token consumption (including cache) reached 99,944,644,535 tokens, of which 2,091,474,371 were non‑cached. The number of LLM requests was 1,582,007.

Technical repository

https://github.com/MopMonkAI/MopMonkAgent
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

benchmarkAI securityvulnerability detectionMiniMax M3CyberGymMopMonk
Black & White Path
Written by

Black & White Path

We are the beacon of the cyber world, a stepping stone on the road to security.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.