How JoyCode Agent Achieved 74.6% Pass@1 on SWE‑bench Verified and Ranked Top‑3 Globally
JoyCode Agent, an AI‑driven multi‑agent system, secured a 74.6% pass@1 rate on the SWE‑bench Verified benchmark, placing it in the global Top‑3 while cutting computational resource usage by 30‑50% through a novel patch‑test co‑generation and iterative verification pipeline.
Overview
JoyCode Agent is an AI‑driven system that tackles the SWE‑bench Verified benchmark for automated software repair. It achieved a 74.6% pass@1 rate, placing it in the global Top‑3 and reducing computational resource consumption by 30‑50% compared with leading baselines.
Benchmark Background
SWE‑bench Verified, developed by Princeton and collaborators, evaluates AI systems on real‑world GitHub issues from projects such as scikit‑learn, matplotlib, and requests. Success requires generating patches that pass a suite of automatically created tests (Fail2Pass, Pass2Pass) in a single attempt.
Challenges
Understanding entire codebases and performing cross‑file reasoning.
Managing a massive search space of candidate patches.
Lack of diverse reasoning trajectories, leading to convergence on similar solutions.
Automated verification and feedback loops are still immature.
High token consumption and diminishing cost‑benefit ratio.
Error accumulation across multi‑round agent interactions.
Proposed Solution
The core idea is “patch‑test co‑generation and iterative verification”. The workflow consists of four agents:
Testing Agent
Generates three types of tests for each issue: FAIL‑TO‑PASS, PASS‑TO‑PASS (regression), and edge‑case PASS‑TO‑PASS. Tests are pre‑validated on the buggy code before being used to evaluate patches.
Patch Agent
Operates in an observe‑think‑act loop inside a Docker‑isolated environment. It parses the issue, explores the repository, formulates a plan, edits code with a precise code‑editing tool, and runs the generated tests.
CSR Agent
When a patch fails, the agent compresses the execution trajectory, performs root‑cause attribution (test vs. patch), retrieves similar successful trajectories from a compressed‑trajectory pool, and supplies this experience to the Patch Agent for a guided retry.
Decision Agent
Acts as an arbiter, voting between the initial patch and the experience‑driven retry patch based on code quality, correctness, minimality, and risk.
Results
On the SWE‑bench Verified Pass@1 official evaluation, JoyCode Agent reached 74.6% success, outperforming baselines while cutting token usage by 30‑50%. The system produces a high‑quality patch pool that is reproducible and extensible for further research.
Open‑Source
Source code is available on GitHub: https://github.com/jd-opensource/joycode-agent and Gitee: https://gitee.com/JD-opensource/joycode-agent.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
