Why Top LLMs Score 0% on the New ProgramBench: Engineering Intelligence’s Next Battleground
The newly released ProgramBench benchmark forces leading LLMs to rebuild full software projects from only usage docs, revealing a 0% full‑completion rate for Claude Opus, GPT‑5, Gemini and others, and exposing the gap between local code generation and true engineering intelligence.
