ASI‑EVOLVE: AI Designs AI and Beats Human SOTA by Almost Three‑Fold

The open‑source ASI‑EVOLVE framework lets AI autonomously design AI across model architecture, data curation, and reinforcement‑learning algorithms, achieving up to three times the human‑level state‑of‑the‑art performance and demonstrating cross‑domain gains in drug‑target prediction.

SuanNi
SuanNi
SuanNi
ASI‑EVOLVE: AI Designs AI and Beats Human SOTA by Almost Three‑Fold

Why AI research is hard

Traditional AI research suffers from three heavy burdens: high execution cost, vast search space, and noisy feedback signals. Training a candidate model can consume dozens to hundreds of GPUs, and each experiment yields multi‑dimensional metrics that require expert intuition to interpret.

A closed‑loop with four gears

ASI‑EVOLVE decomposes each evolution round into four steps—learn knowledge, generate design, run experiment, write analysis—implemented by five modules:

Researcher : samples successful or failed attempts from a database, retrieves relevant paper insights from a cognition library, and uses a large model to generate a new code proposal with a natural‑language motivation.

Engineer : executes the generated code in a real training environment, returns structured evaluation metrics, and applies an early‑termination filter to save GPU time. When rules cannot score a design, an LLM acts as a judge for qualitative assessment.

Analyzer : distills massive training logs, benchmark scores, and resource usage into a concise diagnostic report that highlights effective designs, failure patterns, and pitfalls for future iterations.

Cognition library : stores embeddings of ~150 linear‑attention papers, ~80 graph‑neural‑network papers, and ~10 cutting‑edge RL papers, providing priors that boost early‑stage performance.

Database : persists each round’s motivation, code, results, and analysis as a node; sampling strategies include greedy, random, UCB1 (exploration‑exploitation), and MAP‑Elites (diversity preservation).

Three‑line battle: architecture, data, RL

Model architecture : Starting from DeltaNet, ASI‑EVOLVE searched 1,773 rounds, producing 1,350 candidate architectures, of which 105 outperformed the DeltaNet baseline. The top five were scaled to ~13 B parameters and evaluated on 100 B tokens, achieving 57.28% accuracy on the development set (1.52 pp above DeltaNet) and 45.40% on the generalization set. Analysis revealed a shift from fixed path allocation to adaptive multi‑scale routing with hierarchical gating and learnable temperature.

Data curation : The system automatically designed cleaning strategies for the Nemotron‑CC corpus, removing 168 B low‑quality tokens and producing the Nemotron‑CCASI+ dataset (504 B tokens). A 3 B model trained on this data improved average benchmark score by 3.96 points, with especially large gains on knowledge‑intensive tasks: MMLU +18.64, CSQA +18.80, MedQA +13.48.

Reinforcement‑learning algorithms : Using GRPO as a baseline, ASI‑EVOLVE evolved RL designs over 300 evolution rounds. Ten algorithms outperformed GRPO in the exploration phase, and three led the 14 B‑parameter validation. The best design raised AMC32 from 67.5 to 80.0, AIME24 from 20.0 to 31.67, and OlympiadBench by 5.04 points. The evolved algorithms matched human‑derived variance‑control techniques.

Cross‑domain validation

On a circle‑packing benchmark, GPT‑5‑mini reached a score of 2.63597 in 17 rounds, surpassing AlphaEvolve and OpenEvolve. Ablation studies showed that removing the Analyzer slowed progress, while removing the Cognition library caused a prolonged cold‑start slump.

In drug‑target interaction prediction, starting from the DrugBAN architecture, the system incorporated 80 graph‑neural‑network papers and produced a new model (ban_sinkhorn_ds_marginal_topk_v6) that improved BindingDB AUROC by 1.91 and F1 by 2.95. In cold‑start, out‑of‑distribution tests, AUROC gains reached 6.94 (drug), 3.56 (protein), and 4.36 (both), demonstrating transferable molecular representations.

Insights from sampling strategies

When a strong prior exists in the cognition library, UCB1 converges faster and more stably than MAP‑Elites, confirming that good priors reduce the need for exhaustive diversity preservation.

Conclusion

ASI‑EVOLVE demonstrates that an AI‑in‑the‑loop can autonomously close the research loop—learning, designing, experimenting, and analyzing—across architecture, data, and algorithmic dimensions, achieving results that surpass human‑crafted SOTA by up to threefold. Repository: https://github.com/GAIR-NLP/ASI-Evolve

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Reinforcement LearningNeural Architecture SearchEvolutionary AlgorithmsAI-driven AIASI-EVOLVECross-domain AIdata curation
SuanNi
Written by

SuanNi

A community for AI developers that aggregates large-model development services, models, and compute power.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.