Databases 19 min read

A Complete 2025 Guide to Text‑to‑SQL Datasets

This article compiles and categorizes the most significant Text‑to‑SQL and NL2SQL datasets released up to 2025, detailing their origins, sizes, domains, and evaluation benchmarks, while providing quick links to papers and dataset repositories for researchers and developers.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
A Complete 2025 Guide to Text‑to‑SQL Datasets

Preface

When researching AI4SQL/AI4DB/DB4AI products, we found that improving SQL capabilities largely depends on high‑quality datasets. To help developers quickly obtain resources, we have organized recent Text2SQL/NL2SQL datasets into a curated list, including both training and evaluation sets.

We arrange the datasets chronologically, linking to their papers and download URLs. Representative evaluation sets include Spider and BIRD‑SQL , and we also associate rankings where available.

March 2025

NL2SQL‑Bugs

NL2SQL‑Bugs is the first benchmark focused on detecting semantic errors in NL2SQL translations, addressing systematic errors that current models miss. It features a two‑level taxonomy (9 categories, 31 sub‑categories) and 2,018 error‑annotated instances.

NL2SQL‑Bugs
NL2SQL‑Bugs

Experiments show large models achieve only 75.16% detection accuracy and uncover 122 annotation errors in Spider and BIRD benchmarks.

Links: Paper [1] / Dataset [2]

OmniSQL

SynSQL‑2.5M, released with the OmniSQL paper, is currently the largest cross‑domain synthetic Text2SQL dataset, containing 2.5 million high‑quality samples covering 16,583 databases and diverse SQL structures.

OmniSQL
OmniSQL

The dataset is generated by open‑source large models, released under Apache 2.0, and the accompanying OmniSQL models (7B/14B/32B) are also available for training.

Links: Paper [3] / Dataset [4]

TinySQL

TinySQL is a progressive Text2SQL dataset designed to address the excessive complexity of existing datasets, enabling research on model interpretability. By controlling SQL command and language variations, it offers tasks ranging from basic to advanced queries.

TinySQL
TinySQL

The dataset aids analysis of how Transformers learn and generate SQL, supporting evaluation of explainability methods and synthetic data design improvements.

Links: Paper [5] / Dataset [6]

Before 2025

WikiSQL

Released in September 2017 by Salesforce, WikiSQL contains 80,654 natural‑language questions and 77,840 simple SQL statements sourced from Wikipedia.

Links: Paper [7] / Dataset [8]

Spider 1.0

Introduced in September 2018 by Yale University, Spider is a widely‑used cross‑domain benchmark with 10,181 questions and 5,693 complex SQL queries.

Links: Paper [9] / Leaderboard [10]

SParC

Released in June 2019, SParC provides 4,298 multi‑turn question sequences (over 12 k unique questions) across 138 domains and 200 complex databases.

Links: Paper [11] / Leaderboard [12]

CSpider (Chinese)

Published in September 2019, CSpider translates Spider into Chinese, offering 10,181 questions and 5,693 SQL queries covering 200 databases.

Links: Paper [13] / Leaderboard [14]

CoSQL

Introduced in September 2019, CoSQL contains over 30 k dialogue turns and 10 k annotated SQL queries from 3 k Wizard‑of‑Oz conversations across 138 domains.

Links: Paper [15] / Leaderboard [16]

KaggleDBQA

Released in June 2021, KaggleDBQA is a challenging cross‑domain real‑world web database benchmark with domain‑specific data types and unrestricted questions.

Links: Paper [17] / Dataset [18]

Spider‑Syn

June 2021 saw Spider‑Syn, a benchmark for evaluating model robustness to synonym substitution, built on the original Spider dataset.

Links: Paper [19] / Dataset [20]

SEDE

Also in June 2021, SEDE (Stack Exchange Data Explorer) offers over 12 k SQL queries with natural‑language descriptions, featuring complex nesting, date, and text operations.

Links: Paper [21] / Dataset [22]

CHASE

August 2021 introduced CHASE, a large‑scale pragmatic Chinese Text‑to‑SQL dataset covering cross‑database context‑dependent queries.

Links: Paper [23] / Dataset [24]

Spider‑DK

September 2021’s Spider‑DK evaluates model robustness to domain knowledge, extending the original Spider dataset.

Links: Paper [25] / Dataset [26]

EHRSQL

January 2023’s EHRSQL provides a large, high‑quality dataset for question answering over MIMIC‑III and eICU electronic health records, collected from 222 clinicians.

Links: Paper [27] / Dataset [28]

BIRD‑SQL

May 2023, a joint effort by Hong Kong University and Alibaba, released BIRD, a massive cross‑domain dataset with 12,751 question‑SQL pairs across 95 databases and 37 domains.

Links: Paper [29] / Leaderboard [30]

UNITE

May 2023, UNITE aggregates 18 public Text2SQL datasets, adding ~120 k examples and tripling the number of SQL patterns compared to Spider.

Links: Paper [31] / Dataset [32]

Archer

February 2024 introduced Archer, a bilingual dataset for complex reasoning (arithmetic, commonsense, hypothetical) with 1,042 English and 1,042 Chinese questions.

Links: Paper [33] / Leaderboard [34]

BookSQL

June 2024’s BookSQL contains 100 k query‑SQL pairs, 1.25× larger than WikiSQL, designed with financial‑domain expertise for accounting benchmarks.

Links: Paper [35] / Dataset [36]

Spider 2.0

August 2024, XLang AI released Spider 2.0, an advanced evaluation framework with 600 enterprise‑level Text‑to‑SQL workflow questions from real databases (e.g., BigQuery, Snowflake, PostgreSQL).

Links: Paper [37] / Leaderboard [38]

BEAVER

September 2024’s BEAVER originates from real enterprise data warehouses, containing natural‑language queries and correct SQL statements collected from user histories.

Links: Paper [39] / Dataset [40]

PRACTIQ

October 2024 introduced PRACTIQ, a practical conversational Text‑to‑SQL dataset featuring ambiguous and unanswerable queries.

Links: Paper [41]

TURSpider

November 2024 released TURSpider, a Turkish Text‑to‑SQL dataset mirroring Spider’s complexity, with a development set of 1,034 rows and a training set of 8,659 rows.

Links: Paper [42] / Dataset [43]

Synthetic Text‑to‑SQL

November 2024, gretelai released a high‑quality synthetic Text‑to‑SQL dataset generated with Gretel Navigator, released under Apache 2.0.

Links: Dataset [44]

Recommendation Plan

We will continue to recommend high‑quality datasets in the future. Stay tuned!

References

[1] NL2SQL‑Bugs paper: https://arxiv.org/pdf/2503.11984

[2] NL2SQL‑Bugs dataset: https://github.com/HKUSTDial/NL2SQL-Bugs-Benchmark

[3] OmniSQL paper: https://arxiv.org/html/2503.02240

[4] OmniSQL dataset: https://huggingface.co/datasets/seeklhy/SynSQL-2.5M

[5] TinySQL paper: https://arxiv.org/html/2503.12730

[6] TinySQL dataset: https://huggingface.co/collections/withmartian/tinysql-6760e92748b63fa56a6ffc9f

[7] WikiSQL paper: https://arxiv.org/pdf/1709.00103.pdf

[8] WikiSQL dataset: https://github.com/salesforce/WikiSQL

[9] Spider 1.0 paper: https://arxiv.org/pdf/1809.08887.pdf

[10] Spider 1.0 leaderboard: https://yale-lily.github.io/spider

[11] SParC paper: https://arxiv.org/pdf/1906.02285.pdf

[12] SParC leaderboard: https://yale-lily.github.io/sparc

[13] CSpider paper: https://arxiv.org/pdf/1906.02285.pdf

[14] CSpider leaderboard: https://taolusi.github.io/CSpider-explorer/

[15] CoSQL paper: https://ar5iv.labs.arxiv.org/html/1909.05378

[16] CoSQL leaderboard: https://yale-lily.github.io/cosql

[17] KaggleDBQA paper: https://arxiv.org/abs/2106.11455

[18] KaggleDBQA dataset: https://github.com/Chia-Hsuan-Lee/KaggleDBQA/

[19] Spider‑Syn paper: https://ar5iv.labs.arxiv.org/html/2106.01065

[20] Spider‑Syn dataset: https://github.com/ygan/Spider-Syn

[21] SEDE paper: https://ar5iv.labs.arxiv.org/html/2106.05006

[22] SEDE dataset: https://github.com/hirupert/sede

[23] CHASE paper: https://aclanthology.org/2021.acl-long.180.pdf

[24] CHASE dataset: https://github.com/xjtu-intsoft/chase

[25] Spider‑DK paper: https://ar5iv.labs.arxiv.org/html/2109.05157

[26] Spider‑DK dataset: https://github.com/ygan/Spider-DK

[27] EHRSQL paper: https://arxiv.org/html/2301.07695

[28] EHRSQL dataset: https://github.com/glee4810/EHRSQL

[29] BIRD‑SQL paper: https://arxiv.org/pdf/2305.03111.pdf

[30] BIRD‑SQL leaderboard: https://bird-bench.github.io/

[31] UNITE paper: https://ar5iv.labs.arxiv.org/html/2305.16265

[32] UNITE dataset: https://github.com/awslabs/unified-text2sql-benchmark

[33] Archer paper: https://arxiv.org/html/2402.12554

[34] Archer leaderboard: https://sig4kg.github.io/archer-bench/

[35] BookSQL paper: https://arxiv.org/html/2406.07860

[36] BookSQL dataset: https://github.com/Exploration-Lab/BookSQL

[37] Spider 2.0 paper: https://spider2-sql.github.io/

[38] Spider 2.0 leaderboard: https://spider2-sql.github.io/

[39] BEAVER paper: https://arxiv.org/html/2409.02038

[40] BEAVER dataset: https://github.com/peterbaile/beaver

[41] PRACTIQ paper: https://arxiv.org/html/2410.11076

[42] TURSpider paper: https://ieeexplore.ieee.org/document/10753591

[43] TURSpider dataset: https://github.com/alibugra/TURSpider

[44] Synthetic Text‑to‑SQL dataset: https://huggingface.co/datasets/gretelai/synthetic_text_to_sql

machine learningNL2SQLText2SQLAI for DatabasesSQL Datasets
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.