Artificial Intelligence 8 min read

How SWE‑agent Enables GPT‑4 to Fix GitHub Bugs Faster Than Devin

The Princeton team’s open‑source SWE‑agent equips GPT‑4 with a specialized agent‑computer interface that lets it understand, edit, test, and repair code in real GitHub repositories, achieving state‑of‑the‑art bug‑fixing performance with low latency and cost.

21CTO

Apr 4, 2024

How SWE‑agent Enables GPT‑4 to Fix GitHub Bugs Faster Than Devin

SWE‑agent, a new AI programmer released by a Princeton University team, transforms large language models such as GPT‑4 into software‑engineering agents capable of autonomously fixing bugs in real GitHub repositories.

Benchmark results show that SWE‑agent matches Devin’s accuracy and surpasses it in GitHub issue resolution, fixing 12.29% of problems (Devin fixes 13.84%) while completing each bug fix in an average of 93 seconds, achieving SOTA performance.

Core Features

The agent includes an open‑source Agent‑Computer Interface (ACI) that supports code editing and execution. Designed specifically for language models, the interface provides commands for navigating repositories, searching files, editing lines, and converting inputs into code, enabling seamless interaction between the GPT‑4‑driven agent and the codebase.

Research indicates that a generic bash terminal does not yield optimal results; the tailored ACI improves the model’s comprehension and performance, ensuring accurate and efficient software‑engineering problem solving.

Workflow

Understanding the Issue : SWE‑agent uses NLP to parse the problem description in a GitHub issue, leveraging GPT‑4’s ability to comprehend human‑written reports.

Agent‑Computer Interface (ACI) : Through ACI, the agent can browse, search, view, edit, and execute code within the repository.

Code Analysis and Repair : After grasping the issue, the agent analyzes relevant code, locates bugs or vulnerabilities, and generates fixes, which may involve modifying existing code, adding missing parts, or refactoring.

Automated Testing : SWE‑agent automatically writes and runs test cases to verify that the changes resolve the original problem without introducing new errors.

Performance Feedback : Each step produces feedback that is evaluated against the SWE‑bench benchmark to assess whether the generated pull request truly solves the issue.

Iteration and Optimization : Continuous feedback and performance data allow the team to refine the ACI and improve the agent’s accuracy and efficiency.

Architecture

The system employs a dedicated terminal that enables the agent to open, scroll, and edit files precisely, write and execute tests, and thereby enhance code quality and efficiency. This specialized terminal is crucial for the agent’s high performance on software‑engineering tasks.

During development, the team discovered that limiting the AI’s view to 100 lines of code at a time, rather than the entire file, improves planning and execution, simplifying the agent’s reasoning and boosting overall performance.

Evaluation and Community Reception

GPU researcher Jim Fan praised SWE‑agent, noting that careful design of GPT‑4’s command‑line tools enabled a 12.3% success rate on SWE‑bench without requiring major model breakthroughs. He anticipates that future models like GPT‑5 will further enhance instruction execution and context handling.

Conclusion

SWE‑agent’s open‑source release provides a clear, practical approach to AI‑assisted software engineering, emphasizing cost‑effective solutions (targeting under $4 per task) and laying groundwork for future advancements in AI programming.

software engineering GPT-4 AI programming SWE-agent

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.