Machine Heart
May 20, 2026 · Artificial Intelligence
Self‑Evolving Harness Engineering Propels GPT‑5.4 to a 7‑Point Gain, Securing a Global Top‑3 Spot
The paper introduces Agentic Harness Engineering (AHE), an observability‑driven framework that automatically evolves coding‑agent harnesses, boosting GPT‑5.4's pass@1 score on Terminal‑Bench 2 from 69.7% to 77.0% (+7.3 points), achieving a worldwide top‑three ranking and demonstrating strong cross‑task and cross‑model generalization.
Agentic Harness EngineeringCross-Model GeneralizationGPT-5.4
0 likes · 14 min read
