ChatGPT’s Bug‑Fixing Ability Reaches State‑of‑the‑Art on the QuixBugs Benchmark
Researchers from Germany and the UK evaluated ChatGPT and three other AI models on the QuixBugs benchmark, finding that ChatGPT correctly fixed 31 of 40 bugs—outperforming CodeX, CoCoNut, and Standard APR—and sparked mixed reactions about its impact on software engineering and OpenAI’s broader strategies.
Researchers from Germany and the United Kingdom set up a benchmark arena to evaluate how well ChatGPT can fix bugs.
Using the standard QuixBugs benchmark of 40 buggy programs, they compared ChatGPT with three other AI “bug‑fixers”: CodeX, CoCoNut and Standard APR.
In the first round, ChatGPT correctly repaired 19 bugs, while CodeX fixed 21, CoCoNut 19 and Standard APR only 7. The answers from ChatGPT were the most similar to those of CodeX, reflecting their shared language‑model lineage.
“What is wrong with this code?”
When provided with additional information on the problematic cases, ChatGPT’s performance improved dramatically, ultimately fixing 31 out of the 40 bugs – the best result among the four models and a new state‑of‑the‑art (SOTA) achievement.
Reddit users reacted with a mixture of caution and excitement, posting titles such as “Watch out” and “Be careful”. Some argued that the tool will make programmers’ work easier, while others warned that increased automation could reduce the need for human labor.
Beyond the bug‑fixing study, the article notes that OpenAI is hiring around 1,000 outsourced workers in Latin America and Eastern Europe to label data and train ChatGPT to write code, with 40 % of them being programmers who document their reasoning steps.
OpenAI’s business model currently relies on API and token fees, software licences, and a paid “ChatGPT Pro” tier priced at $42 per month. Recent investments include a multi‑billion‑dollar injection from Microsoft and talks of a $300 million‑plus funding round.
In a follow‑up experiment, ChatGPT initially failed the “bitcount” problem in QuixBugs but succeeded after being asked the same question again with more context, suggesting that iterative prompting can further boost its bug‑fixing ability.
References: arXiv:2301.08653 , PCMag article , Reddit discussion , University page .
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.