Do AI Coding Assistants Boost Productivity? A Randomized Study Says No

A recent randomized controlled trial by the nonprofit AI research group METR found that experienced open‑source developers actually spent 19% more time on tasks when using AI coding tools, contradicting the common belief that such tools increase development speed by around 20%.

Data Party THU
Data Party THU
Data Party THU
Do AI Coding Assistants Boost Productivity? A Randomized Study Says No

Motivation

Benchmark suites for AI‑assisted coding often rely on synthetic tasks that ignore prior context and human interaction, which can overestimate AI capabilities. METR therefore performed a randomized controlled trial to assess how AI programming tools affect the productivity of experienced open‑source developers.

Methodology

Sixteen developers (average 5 years experience, contributors to large open‑source repositories with >22 k stars and millions of lines of code) supplied 246 real‑world issues (bug fixes, feature additions, refactorings) from their own projects. Each issue was randomly assigned to either an AI‑allowed condition—where participants could use any AI tool (primarily Cursor Pro with Claude 3.5/3.7 Sonnet)—or an AI‑blocked condition—where no generative assistance was permitted. Developers recorded their screens, reported total time spent, and were compensated $150 USD per hour.

Results

The AI‑allowed group took on average 19 % longer to complete tasks than the AI‑blocked group, a statistically significant slowdown that contradicts the expected 20‑24 % speedup.

Result chart showing increased time with AI
Result chart showing increased time with AI

When AI was permitted, developers spent less time on manual coding and information search, but more time on prompt engineering, waiting for model output, reviewing AI‑generated code, and idle periods. METR examined 20 experimental attributes; five appeared to contribute to the slowdown, while eight showed mixed or unclear effects.

Analysis of Causes

Key contributors to the slowdown were the overhead of writing prompts, latency in model responses, and the cognitive load of verifying AI‑generated code. The experiment controlled for confounding factors such as model version, task‑difficulty distribution, and ensured comparable pull‑request quality across conditions.

Limitations

The study’s sample size (16 developers, 246 tasks) limits generalizability, and the participants’ moderate AI experience may not represent the broader software‑engineering population.

Future Work

Planned extensions include testing non‑developer users, evaluating newer model versions, and investigating methods to reduce prompt‑engineering overhead and latency.

Reference

Full paper: Measuring the Impact of Early‑2025 AI on Experienced Open‑Source Developer Productivity – https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

Code example

来源:人工智能前沿讲习
本文
约2000字
,建议阅读
5
分钟
一项随机对照试验
旨在了解 AI 编程工具如何加速经验丰富的开源开发者的工作效率。
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Software Engineeringdeveloper productivityAI programmingindustry insightrandomized trial
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.