Anthropic Co‑Founder Predicts 60% Chance AI Will Self‑Develop the Next‑Gen Model by End‑2028
Jack Clark’s Import AI analysis forecasts that, based on accelerating benchmark scores such as SWE‑Bench and METR, there is a 60% probability that by the end of 2028 AI systems will be able to autonomously design and train the next generation of more capable models, reshaping research, economics, and alignment challenges.
Jack Clark, co‑founder of Anthropic, published an Import AI (issue 455) analysis that synthesises publicly available data to predict a timeline for AI self‑improvement. He argues that, at the latest by the end of 2028, there is more than a 60 % chance that AI will be able to operate without human intervention and independently develop the next generation of smarter systems.
The argument is built on two key quantitative indicators. SWE‑Bench measures an AI system’s ability to solve real‑world software‑engineering tasks on GitHub; the best Claude 2 model achieved only ~2 % success when the benchmark was released in late 2023, while the later Claude Mythos Preview reached 93.9 % and essentially saturated the test. METR tracks the longest continuous autonomous work time a model can maintain 50 % reliability on a suite of tasks, showing a dramatic increase from 30 seconds for GPT‑3.5 (2022) to 4 minutes for GPT‑4 (2023), 40 minutes for o1 (2024), 6 hours for GPT 5.2 High (2025), and 12 hours for Opus 4.6 (2026). Ajeya Cotra, a METR forecasting expert, estimates that by the end of 2026 automated tools could complete tasks that would otherwise require 100 human hours.
These trends are reflected in a series of research‑oriented benchmarks. CORE‑Bench evaluates the ability to reproduce scientific papers; in September 2024 the best score (≈21.5 %) was achieved by a GPT‑4o model running in the CORE‑Agent framework, and by December 2025 Opus 4.5 scored 95.5 %. MLE‑Bench , covering 75 real competition domains, saw top scores rise from 16.9 % (Oct 2024) to 64.4 % (Feb 2026) with Gemini 3. Post‑training improvements now account for roughly half the gains a human expert would provide.
Anthropic’s own internal experiments illustrate the practical impact of these advances. Their “automatic research” proof‑of‑concept let a multi‑agent team explore a safety research direction and outperform human‑designed baselines. Similar efforts by Meta (automatic Triton kernel generation) and DeepMind (pushing automation wherever possible) show that code generation, kernel optimisation, and model fine‑tuning are increasingly handled without supervision.
The article also discusses broader economic and alignment implications. As AI systems take over more of the research pipeline—data cleaning, literature review, experiment setup—their independent work time lengthens, creating a feedback loop that could outpace human oversight. Current alignment techniques are fragile in recursive self‑improvement loops; even a 99.9 % accurate alignment method would degrade to ~60 % after 500 generations, raising safety concerns. The resulting “automation economy” could shift capital from labour to compute, with new enterprises run largely by digital brains, potentially reshaping wealth distribution and regulatory frameworks.
In summary, the convergence of rapidly improving benchmark performance, expanding autonomous work windows, and successful automation of core research tasks leads the author to conclude that there is a roughly 60 % chance of witnessing a fully autonomous AI system that designs its own successor by the end of 2028; a 30 % chance for 2027, and a lower probability thereafter if the current trajectory stalls.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
