How the New ECD Dataset Supercharges Multimodal LLM Chart Understanding

The paper introduces the Effective Chart Dataset (ECD), a large, high‑quality, diverse synthetic chart collection and the ECDBench benchmark, detailing a five‑stage modular synthesis pipeline, extensive QA generation, and experiments that show consistent performance gains for open‑source multimodal large language models on chart‑understanding tasks.

Data Party THU
Data Party THU
Data Party THU
How the New ECD Dataset Supercharges Multimodal LLM Chart Understanding

Background

Accurate chart element recognition and deep reasoning over chart data are essential for multimodal large language models (MLLMs) to be useful in scientific research, news, and data analysis. State‑of‑the‑art open‑source MLLMs achieve only 30%–50% accuracy on challenging scientific chart benchmarks, and existing synthetic chart datasets suffer from limited visual style, low realism, and overly simple data patterns.

Effective Chart Dataset (ECD)

ECD is a large‑scale synthetic chart dataset designed to overcome these limitations. It contains more than 10,000 charts covering 25 subject domains (e.g., economics, astronomy, medicine) and 29 chart types (line, bar, scatter, heatmap, etc.) arranged in 252 layout combinations. Over 300,000 question‑answer (QA) pairs—one descriptive and one reasoning question per chart—are generated automatically by GPT‑4o and filtered by confidence scores. The dataset achieves the lowest Frechet Inception Distance (FID) among comparable synthetic collections and exhibits the highest pixel‑entropy, indicating superior visual fidelity and content complexity.

Five‑Stage Modular Synthesis Pipeline

Single‑Chart Generation Uses 29 predefined plotting functions. Data, titles, axis labels, and style attributes are produced by independent generators, allowing diverse data trends (monotonic increase, decrease, fluctuation).

Multi‑Chart Composition Sub‑charts are generated sequentially with each sub‑chart conditioned on the previous ones, ensuring semantic consistency across a multi‑panel figure.

Visual Diversification Adds annotations, shadows, zoom‑in insets, and varies fonts and axis styles. Libraries such as seaborn are employed to enrich visual appearance and adjust resolution for readability.

Image Quality Filtering Each chart is scored by GPT‑4o on visual clarity and semantic coherence; only charts with scores above the dataset average are retained.

QA Pair Generation and Filtering For every chart, one descriptive and one reasoning QA are generated by GPT‑4o. A second confidence pass discards low‑confidence pairs, yielding a high‑quality QA set.

Model Fine‑Tuning Results

Four open‑source MLLMs—LLaVA‑Next‑Llama3‑8B, MiniCPM‑V2.6, Phi‑3‑Vision, and Qwen2.5‑VL‑7B—were evaluated on six chart test sets. Fine‑tuning on ECD consistently improves description and reasoning accuracy for all models, whereas fine‑tuning on prior chart datasets leads to mixed or degraded performance.

ECDBench: High‑Quality Chart Understanding Benchmark

ECDBench comprises 1,224 charts (364 single‑chart, 860 multi‑chart) with an average resolution of 1378×968 px. Each chart is paired with one descriptive and one reasoning QA, totaling 2,448 QA pairs. The benchmark reports three metrics: description accuracy, reasoning accuracy, and overall accuracy.

Evaluation shows that the model o4‑mini attains the highest scores (57.03% reasoning, 77.45% description, 67.24% average). Models fine‑tuned on ECD, such as LLaVA‑Next‑Llama3‑8B, achieve substantial gains, confirming the effectiveness of the high‑quality QA pairs.

Resources

Paper: https://arxiv.org/pdf/2508.06492

Code repository: https://github.com/yuweiyang-anu/ECD

Project homepage: https://effective-chart-dataset-synthesis.github.io

Conclusion

The modular synthesis pipeline and rigorous QA generation produce a synthetic chart dataset that closely mirrors real scientific figures in style, diversity, and complexity. ECDBench provides a comprehensive evaluation framework for chart understanding, enabling future advances in multimodal reasoning, scientific AI assistants, and automated chart generation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIbenchmarkMLLMsynthetic datasetChart Understanding
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.