PaperAgent
Mar 6, 2026 · Artificial Intelligence
BeyondSWE: Rethinking Code Agent Benchmarks with Real‑World Multi‑Repo Challenges
BeyondSWE expands code‑agent evaluation beyond single‑repo bug fixing by introducing four realistic scenarios, scaling to 246 repositories and 500 samples, revealing a sharp performance drop for top models and highlighting the nuanced impact of search‑augmented agents like SearchSWE.
AI evaluationBeyondSWESearchSWE
0 likes · 6 min read
