Artificial Intelligence 34 min read

How JoyCode Agent Achieved 74.6% Pass@1 on SWE‑bench Verified and Ranked Top‑3 Globally

JoyCode Agent, an AI‑driven multi‑agent system, secured a 74.6% pass@1 rate on the SWE‑bench Verified benchmark, placing it in the global Top‑3 while cutting computational resource usage by 30‑50% through a novel patch‑test co‑generation and iterative verification pipeline.

JD Tech

Nov 5, 2025

How JoyCode Agent Achieved 74.6% Pass@1 on SWE‑bench Verified and Ranked Top‑3 Globally

Overview

JoyCode Agent is an AI‑driven system that tackles the SWE‑bench Verified benchmark for automated software repair. It achieved a 74.6% pass@1 rate, placing it in the global Top‑3 and reducing computational resource consumption by 30‑50% compared with leading baselines.

Benchmark Background

SWE‑bench Verified, developed by Princeton and collaborators, evaluates AI systems on real‑world GitHub issues from projects such as scikit‑learn, matplotlib, and requests. Success requires generating patches that pass a suite of automatically created tests (Fail2Pass, Pass2Pass) in a single attempt.

Challenges

Understanding entire codebases and performing cross‑file reasoning.

Managing a massive search space of candidate patches.

Lack of diverse reasoning trajectories, leading to convergence on similar solutions.

Automated verification and feedback loops are still immature.

High token consumption and diminishing cost‑benefit ratio.

Error accumulation across multi‑round agent interactions.

Proposed Solution

The core idea is “patch‑test co‑generation and iterative verification”. The workflow consists of four agents:

Testing Agent

Generates three types of tests for each issue: FAIL‑TO‑PASS, PASS‑TO‑PASS (regression), and edge‑case PASS‑TO‑PASS. Tests are pre‑validated on the buggy code before being used to evaluate patches.

Patch Agent

Operates in an observe‑think‑act loop inside a Docker‑isolated environment. It parses the issue, explores the repository, formulates a plan, edits code with a precise code‑editing tool, and runs the generated tests.

CSR Agent

When a patch fails, the agent compresses the execution trajectory, performs root‑cause attribution (test vs. patch), retrieves similar successful trajectories from a compressed‑trajectory pool, and supplies this experience to the Patch Agent for a guided retry.

Decision Agent

Acts as an arbiter, voting between the initial patch and the experience‑driven retry patch based on code quality, correctness, minimality, and risk.

Results

On the SWE‑bench Verified Pass@1 official evaluation, JoyCode Agent reached 74.6% success, outperforming baselines while cutting token usage by 30‑50%. The system produces a high‑quality patch pool that is reproducible and extensible for further research.

Open‑Source

Source code is available on GitHub: https://github.com/jd-opensource/joycode-agent and Gitee: https://gitee.com/JD-opensource/joycode-agent.

AI software engineering Agent architecture SWE-Bench Patch Generation automated debugging

Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.