How JoyCode Agent Scored 74.6% Pass@1 on SWE‑Bench Verified: Inside the Patch‑Test Co‑generation Pipeline
JoyCode Agent leverages a multi‑agent, patch‑and‑test co‑generation framework with iterative validation, failure attribution, and experience‑driven retries to achieve a 74.6% Pass@1 rate on the SWE‑Bench Verified benchmark, dramatically reducing computational resources while delivering high‑quality code patches.
Project Background and Goals
SWE‑bench Verified is a benchmark from Princeton University for evaluating AI systems on real software‑engineering tasks, requiring agents to understand issues, analyze codebases, and generate patches that pass full test suites.
JoyCode Agent addresses challenges such as repository‑level understanding, large candidate spaces, limited reasoning diversity, and costly token consumption.
Industry Status and Optimization Ideas
One‑shot prompt engineering fails on repository‑scale tasks.
Failures often lack attribution, leading to blind retries.
Experience reuse is limited, causing repeated exploration of similar failures.
Token consumption explodes due to uncontrolled sampling.
Optimizations include coupling patch generation with Fail2Pass and Pass2Pass tests, implementing a closed‑loop iterative process, adding fine‑grained failure attribution, and leveraging trajectory compression with CSR (case‑based similarity retrieval) for experience‑driven retries.
Overall System Architecture
The pipeline consists of four core agents:
Testing Agent : Generates three test cases (FAIL‑TO‑PASS, PASS‑TO‑PASS, EDGE‑CASE) and validates them on the original code.
Patch Agent : Observes the issue, plans a fix, executes code edits in a Docker‑isolated environment, and iteratively validates patches.
CSR Agent : Compresses execution trajectories, performs root‑cause analysis, retrieves similar successful cases, and provides experience for retries.
Decision Agent : Votes between initial and retried patches to select the optimal solution.
The workflow proceeds as follows:
Testing Agent creates and pre‑validates test cases.
Patch Agent generates an initial patch and runs the tests in Docker.
If the patch fails, CSR Agent compresses the trajectory, attributes the failure, and retrieves a similar successful case.
Patch Agent performs an experience‑driven retry using the retrieved knowledge.
Decision Agent compares the original and retried patches and selects the best one.
Results
JoyCode Agent achieved a 74.6% Pass@1 rate on SWE‑bench Verified, outperforming baseline methods while reducing computational resource consumption by 30‑50%.
Open‑Source Links
GitHub: https://github.com/jd-opensource/joycode-agent
Gitee: https://gitee.com/JD-opensource/joycode-agent
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
