How AI Agents Can Auto‑Generate High‑Quality Research Flowcharts

This article introduces PaperBanana, a multi‑agent AI framework that automates the creation of academic illustration by retrieving references, planning descriptions, styling, visualizing, and iteratively refining images, and evaluates its performance on the new PaperBananaBench benchmark against existing baselines.

PaperAgent
PaperAgent
PaperAgent
How AI Agents Can Auto‑Generate High‑Quality Research Flowcharts

Automatic scientific illustration is a remaining bottleneck for AI scientists, even though current models can write code, run experiments, and draft papers. Existing approaches—code‑centric tools like TikZ/Python‑PPTX and direct image generation via DALL‑E or Midjourney—produce rigid or error‑prone visuals.

PaperBanana Framework: Collaborative Multi‑Agent Design

The core idea is reference‑driven generation combined with a collaboration of five specialized agents: Retriever, Planner, Stylist, Visualizer, and Critic. The pipeline first retrieves similar figures from a reference library, then plans detailed textual descriptions, applies learned academic styling, visualizes the content, and finally iterates with self‑critique.

PaperBanana framework architecture
PaperBanana framework architecture

Agent Responsibilities

Retriever : Uses vision‑language models for semantic retrieval of reference figures, prioritizing visual structure over topic.

Planner : Converts method descriptions into detailed textual plans via in‑context learning.

Stylist : Summarizes academic aesthetic guidelines (color, layout, fonts) from the retrieved set.

Visualizer : Generates images with Nano‑Banana‑Pro / GPT‑Image‑1.5.

Critic : Performs multi‑round self‑critique (default three iterations) to correct factual and visual errors.

Key Innovations

Reference‑driven : Instead of zero‑shot generation, the system first finds similar conference figures as visual templates.

Automatic style learning : The Stylist agent extracts aesthetic norms from the reference pool without manual presets.

Closed‑loop optimization : Critic and Visualizer form a feedback loop that progressively refines the output.

PaperBananaBench: First Benchmark for Academic Illustration Generation

To evaluate the system, the authors constructed a benchmark from NeurIPS 2025 papers, selecting 292 samples with method‑section descriptions averaging 3020 words across four research domains.

Dataset statistics
Dataset statistics

The construction pipeline involved random sampling of 2000 NeurIPS papers, extracting method sections with MinerU, filtering for aspect ratios between 1.5 and 2.5, manual verification of description accuracy and visual quality, and categorizing figures into four topologies: Agent & Reasoning, Vision & Perception, Generative & Learning, Science & Applications.

Evaluation Protocol (VLM‑as‑a‑Judge)

Faithfulness : Does the generated figure accurately reflect the method description and title?

Conciseness : Is extraneous visual clutter removed?

Readability : Are layout, text, and lines clear?

Aesthetics : Does the figure meet academic publishing standards?

Scores are obtained by a VLM judge comparing model‑generated images against human‑drawn references, assigning 100 for Model Win, 0 for Human Win, and 50 for a tie.

Experimental Results

Main results table
Main results table

PaperBanana achieved an overall score of 68.6 % versus 51.6 % for the vanilla baseline (+17.0 %). The most significant gain was in conciseness (+37.2 %) thanks to the Stylist agent removing redundant elements. Domain‑wise, Agent & Reasoning scored highest (69.9 %) while Vision & Perception lagged (52.1 %).

Compared baselines:

Vanilla : Direct prompt‑based generation, poorest performance.

Few‑shot : Ten examples provided, modest improvement.

Paper2Any : State‑of‑the‑art method focusing on high‑level ideas, but low faithfulness.

Human blind tests with three judges on 50 samples gave PaperBanana a win rate of 72.7 %, ties at 20.7 %, and losses at 6.6 %.

Human blind test results
Human blind test results

Conclusion

PaperBanana represents a milestone in automating academic illustration. By leveraging reference‑driven learning and multi‑agent collaboration, it produces method‑section flowcharts that approach human quality in faithfulness, conciseness, readability, and aesthetics.

https://arxiv.org/pdf/2601.23265
https://dwzhu-pku.github.io/PaperBanana/
PaperBanana: Automating Academic Illustration for AI Scientists
automationbenchmarkAI illustrationmulti‑agent systemvisual language modelacademic graphics
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.