How mini‑SWE‑agent Solves 65% of SWE‑bench Bugs with Only 100 Lines of Code

The mini‑SWE‑agent, a lightweight open‑source software‑engineering AI built by the original SWE‑bench team, achieves about 65% bug‑fix success on the SWE‑bench benchmark using roughly 100 lines of Python, thanks to its minimal dependencies, shell‑based execution, linear history, and support for various container environments, offering a fast, extensible alternative to the full‑featured SWE‑agent.

Data Party THU
Data Party THU
Data Party THU
How mini‑SWE‑agent Solves 65% of SWE‑bench Bugs with Only 100 Lines of Code

Overview

mini‑SWE‑agent is an open‑source software‑engineering agent that implements the same bug‑fix task as SWE‑agent but with a minimal code base (~100 lines of core Python, ~200 lines including setup). It executes shell commands directly without a tool‑call interface.

Performance

On the SWE‑bench validation set the agent solves approximately 65 % of the problems , comparable to the original SWE‑agent while being far lighter.

Key design features

Minimal code and dependencies : ~100 lines of Python, no heavy third‑party libraries.

Direct shell execution : each model output is a complete command executed by Python; no separate tool‑call protocol.

Linear history : steps are appended to the message stream, avoiding complex state management.

Independent step execution : commands run in isolated subprocesses, simplifying sandboxing and extension.

Simplified configuration : built‑in code templates replace YAML configuration; CLI commands mini (run) and mini‑v (visual UI) start the agent.

Broad environment support : works in local shells and inside Docker, Podman, Singularity, Apptainer, etc., without code changes.

Retained tooling : batch inference, trajectory browsing, and a visual UI are still provided for large‑scale evaluation.

Installation and usage

Clone the repository and install the minimal requirements:

git clone https://github.com/SWE-agent/mini-swe-agent.git
cd mini-swe-agent
pip install -r requirements.txt   # typically only standard libraries

Run the agent from the command line:

mini   # start the agent in terminal mode
mini-v # launch the optional visual interface

The agent reads a problem description (e.g., a GitHub issue) from stdin or a file, generates a shell command, executes it, and appends the result to the conversation.

Recommended scenarios

Rapid local experimentation, fine‑tuning (FT) or reinforcement‑learning (RL) loops where a lightweight control flow is desired.

Environments where installing large frameworks is impractical.

Evaluations that require a stable, reproducible sandbox.

For use cases that need extensive toolchains, configurable YAML pipelines, or complex multi‑tool state, the full‑featured SWE‑agent is more appropriate.

Background

SWE‑bench is a benchmark built from real GitHub issues and pull requests to assess whether large language models can understand bug reports and automatically fix code. SWE‑agent was released in 2024 by researchers from Princeton University and OpenAI to achieve high bug‑fix rates on this benchmark. mini‑SWE‑agent was created to provide a ~100‑fold reduction in code size while preserving performance.

References

Project repository: https://github.com/SWE-agent/mini-swe-agent

Readme and additional documentation: https://github.com/SWE-agent/mini-swe-agent?tab=readme-ov-file

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMSoftware Engineeringopen sourceAI AgentSWE-bench
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.