What Happens When a Code Agent Faces 1,000+ Files? CoDA‑Bench Exposes the Real Bottleneck
CoDA‑Bench, a new benchmark from RUC, places code agents in a sandbox containing over a thousand heterogeneous data files and requires them to locate the correct dataset, write analysis code, and produce answers, revealing that current agents achieve only about 61 % accuracy overall and struggle mainly with data discovery rather than code generation.
