Anthropic Warns: AI Self‑Improvement Accelerating Too Fast – Calls for a Global Pause
Anthropic’s internal report reveals that Claude now writes over 80% of its code, engineers’ productivity has jumped eight‑fold, and recursive self‑improvement is becoming a real risk, prompting the company to urge a worldwide slowdown of advanced AI development.
Anthropic internal metrics (May 2026)
More than 80 % of the code merged into Anthropic’s codebase was generated by the Claude model. In Q2 2026 the average daily code output per engineer was eight times higher than in 2024 because engineers guided Claude rather than writing code themselves.
Human interventions on Claude‑generated code have been steadily decreasing . An automated Claude‑based code reviewer now catches roughly one‑third of bugs that would otherwise reach production.
A survey of 130 Anthropic researchers (March 2026) reported a median productivity increase of about four‑fold when using the Mythos Preview model compared with no AI assistance.
Performance on benchmarks and experiments
SWE‑bench (real‑world software‑engineering benchmark) scores rose from single‑digit values to near saturation within two years, indicating that Claude can reliably fix bugs in open‑source projects.
CORE‑Bench replication success rose from ~20 % in 2024 to saturation fifteen months later, showing improved ability to reproduce published research.
METR benchmark measured continuous execution time; Claude Mythos Preview ran for at least 16 hours without hitting the benchmark’s upper limit.
Optimization experiments on deterministic code‑speed tasks achieved up to a 52× speedup over the baseline (Claude Opus 4 in May 2025 gave ~3× speedup; Claude Mythos Preview in April 2026 gave ~52× speedup). A human researcher typically needs 4–8 hours to achieve a 4× speedup.
Open‑ended coding tasks
The success rate on open‑ended tasks reached 76 % in May 2026 , a 50‑point increase over six months. Example: a training‑job crash affecting tens of thousands of jobs was diagnosed and fixed by Claude in about two hours , whereas the same issue would normally require two to three days of human effort.
Code quality
Early 2025 Claude‑written code was judged slightly inferior to human code. By late 2025 the quality gap had closed, and by mid‑2026 the two were considered roughly equivalent, with expectations that Claude will surpass human code quality within the next year.
Research automation
Claude can autonomously run defined experiments: given training‑code, it rewrites, executes, times, and iterates to improve performance. In a fully open research problem (AI safety supervision), Claude ran for 800 hours costing ~ $18 000 , closing 97 % of the performance gap that human researchers closed only 23 % in a week.
In research meetings, Claude’s suggested next steps outperformed human choices 64 % of the time (Nov 2025: 51 %; Apr 2026: 64 %).
Future scenarios outlined by Anthropic
Plateau : The rapid growth curve bends into an S‑shape, indicating diminishing returns from scaling and a need for new architectures.
Automation‑driven productivity : AI continues to automate coding, experimentation, and research while humans retain control over direction and evaluation, potentially allowing a hundred‑person team to accomplish the work of thousands.
Full recursive self‑improvement : AI systems become capable of designing and training their successors, making compute the sole bottleneck and reducing human roles to supervision, safety verification, and validation.
Safety and governance considerations
Anthropic emphasizes that unchecked acceleration could outpace societal and regulatory mechanisms. Verifiable global pauses would require multiple leading labs to coordinate and to be able to detect whether others have halted training, which is technically challenging because training runs are easy to conceal.
Industry context
Other organizations are pursuing similar trajectories: the startup Recursive raised $650 M to focus on recursive self‑improvement; ICLR 2026 featured a dedicated “AI Recursive Self‑Improvement” workshop; DeepMind’s AlphaEvolve demonstrates an evolution‑style agent that repeatedly mutates and selects algorithms, achieving superhuman performance on mathematical‑algorithm and chip‑design tasks.
Organizational bottlenecks
As Claude automates more of the development pipeline, new bottlenecks emerge. Anthropic observed that human code review is becoming the limiting factor, an illustration of Amdahl’s law at the organizational level.
Overall, the evidence suggests that AI‑assisted development is accelerating dramatically, with measurable gains in code production, quality, and research speed, while also raising concrete challenges for safety, verification, and coordination.
Code example
为了跟上AI时代我干了一件事儿,我创建了一个知识星球社群:AI俱乐部与副业。想带着大家一起探索
ChatGPT和新的AI时代
。
有
很多小伙伴搞不定ChatGPT账号,
于是我们决定,凡是这三天之内加入ChatPGT的小伙伴,我们直接送一个正常可用的永久ChatGPT独立账户。
不光是增长速度最快,我们的星球品质也绝对经得起考验,短短一个月时间,我们的课程团队发布了
8个专栏、18个副业项目
:Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
