SuanNi
Jun 17, 2026 · Artificial Intelligence
How Harness Design Alters Coding Agent Scores: Insights from the First Independent Claw‑SWE‑Bench
The Claw‑SWE‑Bench benchmark isolates model, harness, and task variables, showing that changing only the harness can shift Pass@1 scores by up to 27 points and affect cost dramatically, while also providing a lightweight 80‑question Lite version for rapid, low‑cost evaluation.
AI coding agentsClaw-SWE-Benchbenchmark
0 likes · 11 min read
