One‑Click from Experiment Logs to Conference‑Ready LaTeX: Google’s PaperOrchestra Changes Paper Writing

PaperOrchestra, Google’s multi‑agent framework, turns raw experiment logs, brief ideas, LaTeX templates and conference guidelines into fully formatted CVPR/ICLR papers, using five coordinated agents, Semantic Scholar verification, PaperBanana figure generation, and a refinement loop that boosts simulated acceptance rates by up to 22% while running in under 40 minutes.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
One‑Click from Experiment Logs to Conference‑Ready LaTeX: Google’s PaperOrchestra Changes Paper Writing

Recent months have seen a surge of “auto‑research” tools such as The AI Scientist, AutoResearch, EvoScientist, ARIS and the live‑streamed FARS project. Unlike those end‑to‑end attempts, Google’s PaperOrchestra adopts a narrower but deeper focus: it directly converts the fragmented materials that remain after an experiment—log files, a rough idea, a LaTeX template and the target conference’s formatting guide—into a complete manuscript that complies with CVPR, ICLR and other top‑conference standards.

Five‑Agent Pipeline

The system is built around five specialized agents that cooperate sequentially and in parallel:

Outline Agent : structures the raw logs and ideas into a detailed outline, plans required figures and defines a literature‑search strategy.

Plot Agent and Literature Review Agent : run concurrently; the Plot Agent creates statistical charts and concept diagrams, while the Literature Agent searches the web, retrieves candidate papers and validates them via the Semantic Scholar API , discarding any work that falls outside the conference deadline or lacks a verified mapping.

Chapter‑Writing Agent : integrates the data, outline and retrieved references to produce the full LaTeX source for each section.

Content‑Refinement Agent : performs a final review and invokes AgentReview to simulate peer‑review feedback. Modifications are kept only if they improve the overall automated score or at least do not degrade any sub‑score.

pipeline diagram
pipeline diagram

Citation and Figure Generation

The literature‑review agent first discovers candidate papers through web search, then mandates verification with the Semantic Scholar API. This hard constraint ensures that only verified, timely references appear in the Related Work section, preventing the factual drift common in earlier LLM‑only systems.

For visual content, the Plot Agent is coupled with PaperBanana , a model that can produce both statistical plots and conceptual diagrams. A visual‑language model checks each generated image against the design goal; if mismatches are found, the prompt is iteratively refined until the figure and its caption fit the surrounding text.

figure generation
figure generation

Refinement Loop and Runtime

After the initial draft, the Content‑Refinement Agent runs a simulated peer‑review cycle using AgentReview. The system re‑scores the LaTeX source after each change and retains edits only when the overall score improves or stays constant without any sub‑score regression. The entire end‑to‑end process—literature search, figure creation, writing and refinement—averages 39.6 minutes of wall‑clock time, despite invoking large language models dozens of times.

Empirical Evaluation

To measure capability, the authors built PaperWritingBench , a benchmark of 200 top‑conference papers (100 CVPR 2025 and 100 ICLR 2025). Each paper was reverse‑engineered into anonymized logs and idea summaries, then fed to PaperOrchestra. Results show a simulated acceptance‑rate increase of 19 % for CVPR and 22 % for ICLR after the refinement loop.

acceptance rate improvement
acceptance rate improvement

Human blind‑testing of the generated papers shows that PaperOrchestra outperforms baseline systems in the Related Work dimension by 50 %–68 % and in overall paper quality by 14 %–38 % . Automated scoring correlates strongly with human judgments, confirming the reliability of the internal evaluation metric.

Scope and Limitations

PaperOrchestra is deliberately scoped to the writing stage; it does not attempt to automate hypothesis generation, data collection or experiment execution. The authors position it as an “assistant system” that relieves researchers from the tedious formatting, citation and figure‑creation work, allowing them to focus on scientific discovery.

runtime comparison
runtime comparison

In summary, PaperOrchestra demonstrates that a well‑engineered multi‑agent pipeline can transform raw experimental artifacts into conference‑ready manuscripts with competitive quality, while keeping the process transparent, verifiable and efficiently timed.

benchmarkLLM agentsSemantic Scholarautomated writingPaperBananaPaperOrchestra
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.