How Harness Design Alters Coding Agent Scores: Insights from the First Independent Claw‑SWE‑Bench

The Claw‑SWE‑Bench benchmark isolates model, harness, and task variables, showing that changing only the harness can shift Pass@1 scores by up to 27 points and affect cost dramatically, while also providing a lightweight 80‑question Lite version for rapid, low‑cost evaluation.

AI coding agentsClaw-SWE-Benchbenchmark

0 likes · 11 min read

How Harness Design Alters Coding Agent Scores: Insights from the First Independent Claw‑SWE‑Bench