Artificial Intelligence 6 min read

AutoScientists Open‑Source: Harvard’s Self‑Organizing Agents Enable Long‑Term Autonomous Research

AutoScientists is a self‑organizing multi‑agent framework that automates the full scientific loop—from hypothesis generation to paper writing—demonstrating superior performance on BioML‑Bench (74.4% average rank, +8.33% over baselines) and achieving notable gains in protein‑engineering tasks such as ACE2‑Spike binding.

Data Party THU

Jun 3, 2026

AutoScientists Open‑Source: Harvard’s Self‑Organizing Agents Enable Long‑Term Autonomous Research

Self‑organizing multi‑agent framework

AutoScientists is an open‑source framework that implements a decentralized team of AI agents. Agents share a global state that records proposals, experiments, results, failures and the current best solution. When progress stalls, agents reorganize and explore alternative directions; the shared state includes a discussion forum, experiment queue and dead‑end registry to avoid redundant work.

Benchmark evaluation

AutoScientists was evaluated on BioML‑Bench, a benchmark covering 24 end‑to‑end biomedical machine‑learning tasks (imaging, protein engineering, single‑cell omics, drug discovery). The system achieved an average rank percentile of 74.4 % , an improvement of 8.33 percentage points over the baseline, with the largest gains on drug‑discovery tasks.

Compared with a GPT‑nanochat‑based automatic research loop, AutoScientists reached the same validation‑bits‑per‑byte metric with fewer experiments. Starting from an already‑optimized solution, a single‑agent loop saturated after 100 experiments with no improvement, whereas AutoScientists performed seven improvement iterations within 93 experiments, leaving further improvement potential.

Improving an existing method

On the ACE2‑Spike binding task, AutoScientists discovered a new approach that raised Spearman ρ from 0.747 to 0.840, beyond simple hyper‑parameter tuning. The discovered recipe was frozen and applied unchanged to all 217 ProteinGym supervised substitution experiments, increasing average Spearman ρ from 0.657 to 0.700 (a 6.5 % relative gain) and establishing a new state‑of‑the‑art result.

Long‑term collaborative experiments

The framework enables continuous, long‑running experiments that can be left unattended. Researchers can later inspect which experiments succeeded, which failed, and what knowledge was acquired, effectively treating the system as a collaborative research team rather than a script.

Although the design does not reduce the number of LLM calls relative to single‑agent baselines, the multi‑agent collaboration explores the design space more effectively under a fixed computational budget, yielding better performance.

Source code and project website are publicly available (https://autoscientists.openscientist.ai/). Paper: https://arxiv.org/abs/2605.28655.

Code example

来源：ScienceAI
本文
约1500字
，建议阅读
5
分钟
一支没有中心协调器，只有自我组织的AI研究小队。

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

benchmark multi-agent systems protein engineering scientific automation AutoScientists BioML-Bench self-organizing agents

Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.