Artificial Intelligence 34 min read

How JoyCode Agent Scored 74.6% Pass@1 on SWE‑Bench Verified: Inside the Patch‑Test Co‑generation Pipeline

JoyCode Agent leverages a multi‑agent, patch‑and‑test co‑generation framework with iterative validation, failure attribution, and experience‑driven retries to achieve a 74.6% Pass@1 rate on the SWE‑Bench Verified benchmark, dramatically reducing computational resources while delivering high‑quality code patches.

JD Cloud Developers

Nov 3, 2025

How JoyCode Agent Scored 74.6% Pass@1 on SWE‑Bench Verified: Inside the Patch‑Test Co‑generation Pipeline

Project Background and Goals

SWE‑bench Verified is a benchmark from Princeton University for evaluating AI systems on real software‑engineering tasks, requiring agents to understand issues, analyze codebases, and generate patches that pass full test suites.

JoyCode Agent addresses challenges such as repository‑level understanding, large candidate spaces, limited reasoning diversity, and costly token consumption.

Industry Status and Optimization Ideas

One‑shot prompt engineering fails on repository‑scale tasks.

Failures often lack attribution, leading to blind retries.

Experience reuse is limited, causing repeated exploration of similar failures.

Token consumption explodes due to uncontrolled sampling.

Optimizations include coupling patch generation with Fail2Pass and Pass2Pass tests, implementing a closed‑loop iterative process, adding fine‑grained failure attribution, and leveraging trajectory compression with CSR (case‑based similarity retrieval) for experience‑driven retries.

Overall System Architecture

The pipeline consists of four core agents:

Testing Agent : Generates three test cases (FAIL‑TO‑PASS, PASS‑TO‑PASS, EDGE‑CASE) and validates them on the original code.

Patch Agent : Observes the issue, plans a fix, executes code edits in a Docker‑isolated environment, and iteratively validates patches.

CSR Agent : Compresses execution trajectories, performs root‑cause analysis, retrieves similar successful cases, and provides experience for retries.

Decision Agent : Votes between initial and retried patches to select the optimal solution.

The workflow proceeds as follows:

Testing Agent creates and pre‑validates test cases.

Patch Agent generates an initial patch and runs the tests in Docker.

If the patch fails, CSR Agent compresses the trajectory, attributes the failure, and retrieves a similar successful case.

Patch Agent performs an experience‑driven retry using the retrieved knowledge.

Decision Agent compares the original and retried patches and selects the best one.

Results

JoyCode Agent achieved a 74.6% Pass@1 rate on SWE‑bench Verified, outperforming baseline methods while reducing computational resource consumption by 30‑50%.

Open‑Source Links

GitHub: https://github.com/jd-opensource/joycode-agent

Gitee: https://gitee.com/JD-opensource/joycode-agent

AI Code Generation Software Engineering testing automation multi‑agent system SWE-Bench automated patching

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.