Artificial Intelligence 13 min read

ChartNet: MIT/IBM’s Million‑Scale Synthetic Chart Dataset with 1.5M Diverse Samples

MIT and IBM researchers introduce ChartNet, the largest code‑guided synthetic chart dataset containing 1.5 million multimodal samples across 24 chart types and six libraries, and demonstrate that fine‑tuning visual‑language models on it yields consistent, significant gains on chart reconstruction, data extraction, summarization, and reasoning tasks, outperforming much larger off‑the‑shelf models including GPT‑4o.

HyperAI Super Neural

Jun 11, 2026

Dataset

ChartNet is a million‑scale multimodal dataset for chart understanding. It contains 1.5 million synthetic chart samples covering 24 chart types and six drawing libraries. Each sample includes an image, Python plotting code, tabular data, a natural‑language description, and a chain‑of‑thought QA pair. Additional subsets provide 96 643 human‑annotated synthetic charts, 30 000 real‑world charts sourced from the World Bank, Our World in Data and other institutions, grounding‑aware QA pairs with dense geometric annotations, and a safety‑aligned subset.

Code‑Guided Synthesis Pipeline

The generation process consists of five stages:

Chart‑to‑Code Reconstruction : A visual‑language model (VLM) generates Python code that roughly recreates 150 000 seed chart images from the TinyChart dataset.

Code‑Guided Chart Augmentation : A large language model iteratively rewrites the code, altering data values and labels while preserving the original chart type, producing unlimited variants per seed.

Chart Rendering : Executed code renders chart images; successful scripts are paired with their outputs.

Quality Filtering : VLM‑based evaluation removes images with defects such as text overlap, label cropping, or element occlusion.

Code‑Guided Attribute Generation : VLM extracts semantic attributes from the code‑image pair, generating tabular data and grounding‑aware descriptions.

Experimental Evaluation

Models of three size categories (≤1 B, ≤4 B, ≤7 B parameters) were fine‑tuned on ChartNet. Across four chart‑understanding tasks—chart reconstruction, data extraction, summarization, and chain‑of‑thought QA—fine‑tuned models showed substantial, scale‑independent improvements.

Granite‑Vision‑2B achieved 70.3 % data‑extraction accuracy, surpassing GPT‑4o (46.7 %).

LLaVA‑7B improved chart‑reconstruction metrics by up to +42.4 points.

Summarization quality rose from +9.5 (Qwen2.5‑VL‑3B) to +31.4 (Granite‑Docling‑2B); Granite‑Vision‑2B reached 83.9 %, higher than GPT‑4o and larger open‑source baselines.

Chain‑of‑thought QA accuracy increased up to +15.17 points, with LLaVA‑7B reaching 70.3 % and outperforming specialized models such as ChartGemma.

Compared with off‑the‑shelf models ranging from 20 B to 72 B parameters, ChartNet‑fine‑tuned models consistently outperformed them on nearly all metrics, demonstrating that high‑quality multimodal supervision outweighs sheer model size.

Generalization tests on public benchmarks (ChartCap, ChartMimic‑v2) showed similar gains: Granite‑Vision‑2B’s BLEU score rose from 1.6 to 12.4 on ChartCap, and its accuracy on ChartMimic‑v2 increased from 30.8 to 58.4.

Key Findings

Chart reconstruction: after fine‑tuning, ultra‑compact models (e.g., SmolVLM‑256M) regained full functionality; Granite‑Vision‑2B achieved >90 % on code execution, data consistency, and image similarity; LLaVA‑7B improved data‑consistency by +42.4 points.

Chart data extraction: Granite‑Vision‑2B reached 70.3 % accuracy; LLaVA‑7B improved by +41.8 points, exceeding GPT‑4o.

Chart summarization: all model families improved, with Granite‑Vision‑2B attaining 83.9 %.

QA with chain‑of‑thought: LLaVA‑7B improved by +15.17 points to 70.3 %.

Off‑the‑shelf comparison: fine‑tuned 2 B–7 B models outperformed 20 B–72 B models on almost all metrics.

Public benchmark generalization: Granite‑Vision‑2B BLEU 12.4 on ChartCap and accuracy 58.4 on ChartMimic‑v2.

Reference

ChartNet: A Million‑Scale, High‑Quality Multimodal Dataset for Robust Chart Understanding . arXiv:2603.27064. https://arxiv.org/abs/2603.27064

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI research visual language models multimodal dataset chart understanding ChartNet code-guided synthesis

Written by

HyperAI Super Neural

Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.