Artificial Intelligence 7 min read

Anthropic and OpenAI Launch Parallel AI‑for‑Science Tools on the Same Day

On June 30 2026, Anthropic unveiled Claude Science, an AI workbench for scientists, while OpenAI introduced GeneBench‑Pro, a research‑grade benchmark, together highlighting that the next AI battlefield is the laboratory and showcasing early performance gaps between models and human experts.

PaperAgent

Jul 3, 2026

Anthropic and OpenAI Launch Parallel AI‑for‑Science Tools on the Same Day

On June 30, 2026, two major AI companies announced new offerings aimed at scientific research: Anthropic released Claude Science, an integrated AI workbench for scientists, and OpenAI launched GeneBench‑Pro, a research‑level benchmark designed to evaluate AI’s "Research Taste".

GeneBench‑Pro: a research‑grade evaluation framework

GeneBench‑Pro is not a typical programming benchmark; it presents 129 synthetic tasks covering ten domains and 21 sub‑domains such as genomics, quantitative biology, and translational medicine. Each task provides a messy real‑world dataset, a brief experimental background, and a target estimand tied to downstream decisions. The benchmark defines "Research Taste" as a series of judgments: what questions the data can support, how early diagnostics should adjust the model, and when an initial plan must be abandoned.

Results show that the strongest model, GPT‑5.6 Sol, achieved a top score of 31.5 %. By contrast, a human expert needs 20–40 hours per task, costing several thousand dollars at $200 per hour, while AI inference costs only a few dollars. Even the best models fail on roughly 70 % of the tasks, indicating difficulty in closing the inferential loop.

Claude Science: an AI workbench for scientists

Claude Science addresses the fragmentation of scientific tooling by providing a one‑stop environment with more than 60 curated skills covering genomics, single‑cell analysis, proteomics, structural biology, and cheminformatics. Its architecture includes:

Generalist Coordinating Agent – understands research intent and orchestrates other agents.

Specialist Agents – user‑created or system‑scheduled sub‑agents for specific domains.

Reviewer Agent – automatically checks citations, calculations, figures, and code for consistency.

Live Kernel – maintains session memory so large datasets are loaded only once.

Auditability and reproducibility

Every chart generated by Claude Science is accompanied by the full code, environment specifications, and generation history. Users can edit visualizations with natural language (e.g., "remove grid lines" automatically updates the code). Data never leaves the local infrastructure (on‑premise, HPC clusters, or SSH‑remote), and only analysis context is sent to Claude.

An example shows cross‑species single‑cell RNA‑seq integration across 138 species and 5,672 cell types, with an interactive UMAP visualization and the complete Python script generated inline.

From literature review to manuscript drafting

Claude Science can retrieve papers in parallel from PubMed, bioRxiv, OpenAlex, and CELLxGENE, automatically generate LaTeX literature reviews, compile them to PDF, and let the Reviewer Agent detect and correct citation errors. In one demonstration, the agent identified a mismatched PMID (31178118) assigned to two different tools and corrected the reference automatically.

Dual‑track validation of AI for science

The simultaneous releases mark a "dual‑track validation" stage: OpenAI’s GeneBench‑Pro sets quantitative standards for "good research," while Anthropic’s Claude Science provides an integrated environment that lets scientists immediately leverage AI acceleration. The authors argue that genuine scientific discovery requires not only raw computation but also judgment, iterative capability, and reproducibility, and that AI will become an tireless collaborator for every researcher.

1. OpenAI:
https://openai.com/index/introducing-genebench-pro/
https://cdn.openai.com/pdf/21938268-21af-442f-af93-3b2249afb241/genebench-pro.pdf
https://huggingface.co/datasets/openai/genebench-pro
2. Anthropic:
https://www.anthropic.com/news/claude-science-ai-workbench

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

artificial-intelligence large language models AI for Science AI Workbench Claude Science GeneBench-Pro Research Benchmark

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.