AutoScientists Open‑Source: Harvard’s Self‑Organizing Agents Enable Long‑Term Autonomous Research
AutoScientists is a self‑organizing multi‑agent framework that automates the full scientific loop—from hypothesis generation to paper writing—demonstrating superior performance on BioML‑Bench (74.4% average rank, +8.33% over baselines) and achieving notable gains in protein‑engineering tasks such as ACE2‑Spike binding.
Self‑organizing multi‑agent framework
AutoScientists is an open‑source framework that implements a decentralized team of AI agents. Agents share a global state that records proposals, experiments, results, failures and the current best solution. When progress stalls, agents reorganize and explore alternative directions; the shared state includes a discussion forum, experiment queue and dead‑end registry to avoid redundant work.
Benchmark evaluation
AutoScientists was evaluated on BioML‑Bench, a benchmark covering 24 end‑to‑end biomedical machine‑learning tasks (imaging, protein engineering, single‑cell omics, drug discovery). The system achieved an average rank percentile of 74.4 % , an improvement of 8.33 percentage points over the baseline, with the largest gains on drug‑discovery tasks.
Compared with a GPT‑nanochat‑based automatic research loop, AutoScientists reached the same validation‑bits‑per‑byte metric with fewer experiments. Starting from an already‑optimized solution, a single‑agent loop saturated after 100 experiments with no improvement, whereas AutoScientists performed seven improvement iterations within 93 experiments, leaving further improvement potential.
Improving an existing method
On the ACE2‑Spike binding task, AutoScientists discovered a new approach that raised Spearman ρ from 0.747 to 0.840, beyond simple hyper‑parameter tuning. The discovered recipe was frozen and applied unchanged to all 217 ProteinGym supervised substitution experiments, increasing average Spearman ρ from 0.657 to 0.700 (a 6.5 % relative gain) and establishing a new state‑of‑the‑art result.
Long‑term collaborative experiments
The framework enables continuous, long‑running experiments that can be left unattended. Researchers can later inspect which experiments succeeded, which failed, and what knowledge was acquired, effectively treating the system as a collaborative research team rather than a script.
Although the design does not reduce the number of LLM calls relative to single‑agent baselines, the multi‑agent collaboration explores the design space more effectively under a fixed computational budget, yielding better performance.
Source code and project website are publicly available (https://autoscientists.openscientist.ai/). Paper: https://arxiv.org/abs/2605.28655.
Code example
来源:ScienceAI
本文
约1500字
,建议阅读
5
分钟
一支没有中心协调器,只有自我组织的AI研究小队。Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
