How AI Agents Can Auto‑Generate High‑Quality Research Flowcharts
This article introduces PaperBanana, a multi‑agent AI framework that automates the creation of academic illustration by retrieving references, planning descriptions, styling, visualizing, and iteratively refining images, and evaluates its performance on the new PaperBananaBench benchmark against existing baselines.
Automatic scientific illustration is a remaining bottleneck for AI scientists, even though current models can write code, run experiments, and draft papers. Existing approaches—code‑centric tools like TikZ/Python‑PPTX and direct image generation via DALL‑E or Midjourney—produce rigid or error‑prone visuals.
PaperBanana Framework: Collaborative Multi‑Agent Design
The core idea is reference‑driven generation combined with a collaboration of five specialized agents: Retriever, Planner, Stylist, Visualizer, and Critic. The pipeline first retrieves similar figures from a reference library, then plans detailed textual descriptions, applies learned academic styling, visualizes the content, and finally iterates with self‑critique.
Agent Responsibilities
Retriever : Uses vision‑language models for semantic retrieval of reference figures, prioritizing visual structure over topic.
Planner : Converts method descriptions into detailed textual plans via in‑context learning.
Stylist : Summarizes academic aesthetic guidelines (color, layout, fonts) from the retrieved set.
Visualizer : Generates images with Nano‑Banana‑Pro / GPT‑Image‑1.5.
Critic : Performs multi‑round self‑critique (default three iterations) to correct factual and visual errors.
Key Innovations
Reference‑driven : Instead of zero‑shot generation, the system first finds similar conference figures as visual templates.
Automatic style learning : The Stylist agent extracts aesthetic norms from the reference pool without manual presets.
Closed‑loop optimization : Critic and Visualizer form a feedback loop that progressively refines the output.
PaperBananaBench: First Benchmark for Academic Illustration Generation
To evaluate the system, the authors constructed a benchmark from NeurIPS 2025 papers, selecting 292 samples with method‑section descriptions averaging 3020 words across four research domains.
The construction pipeline involved random sampling of 2000 NeurIPS papers, extracting method sections with MinerU, filtering for aspect ratios between 1.5 and 2.5, manual verification of description accuracy and visual quality, and categorizing figures into four topologies: Agent & Reasoning, Vision & Perception, Generative & Learning, Science & Applications.
Evaluation Protocol (VLM‑as‑a‑Judge)
Faithfulness : Does the generated figure accurately reflect the method description and title?
Conciseness : Is extraneous visual clutter removed?
Readability : Are layout, text, and lines clear?
Aesthetics : Does the figure meet academic publishing standards?
Scores are obtained by a VLM judge comparing model‑generated images against human‑drawn references, assigning 100 for Model Win, 0 for Human Win, and 50 for a tie.
Experimental Results
PaperBanana achieved an overall score of 68.6 % versus 51.6 % for the vanilla baseline (+17.0 %). The most significant gain was in conciseness (+37.2 %) thanks to the Stylist agent removing redundant elements. Domain‑wise, Agent & Reasoning scored highest (69.9 %) while Vision & Perception lagged (52.1 %).
Compared baselines:
Vanilla : Direct prompt‑based generation, poorest performance.
Few‑shot : Ten examples provided, modest improvement.
Paper2Any : State‑of‑the‑art method focusing on high‑level ideas, but low faithfulness.
Human blind tests with three judges on 50 samples gave PaperBanana a win rate of 72.7 %, ties at 20.7 %, and losses at 6.6 %.
Conclusion
PaperBanana represents a milestone in automating academic illustration. By leveraging reference‑driven learning and multi‑agent collaboration, it produces method‑section flowcharts that approach human quality in faithfulness, conciseness, readability, and aesthetics.
https://arxiv.org/pdf/2601.23265
https://dwzhu-pku.github.io/PaperBanana/
PaperBanana: Automating Academic Illustration for AI ScientistsHow this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
