NL2SQL Datasets REEF & text2SQL4PM: Causal Analysis Meets Process Mining

This article introduces two recent NL2SQL benchmark datasets—REEF, a synthetic e‑commerce database for end‑to‑end causal analysis, and text2SQL4PM, a bilingual process‑mining dataset—detailing their construction, evaluation results, and research implications for large language models.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
NL2SQL Datasets REEF & text2SQL4PM: Causal Analysis Meets Process Mining

REEF

REEF is a synthetic e‑commerce database containing 18 interrelated tables (e.g., products, orders, users) whose data distribution is annotated to encode specific causal relationships, enabling realistic causal graph construction.

Paper Intent

The paper proposes ORCA (Orchestrating Causal Agent), an LLM‑based agent system designed to tackle end‑to‑end causal analysis tasks such as “Do coupons increase the probability of a user purchasing a product?”. ORCA automates the full relational‑database analysis pipeline—parsing natural‑language queries, browsing schema, generating correct SQL, preprocessing data, and configuring causal inference models—while preserving expert supervision through interactive human‑in‑the‑loop control.

Dataset Analysis

REEF simulates an e‑commerce environment using a combination of rule‑based logic and probabilistic sampling, implemented with Faker.js in JavaScript. Variable generation follows two patterns:

Random sampling variables : e.g., product price is randomly generated within the range [5, 500].

Causal‑driven variables : e.g., user activity is_active is influenced by registration time signup_days_ago via an S‑shaped probability scaling to model “the longer a user has been registered, the lower the activity”.

Summary

Despite REEF’s complex, realistic structure, ORCA achieves only 60.00% execution accuracy on the dataset, while GPT‑4o mini scores a much lower 6.67%. ORCA still requires manual mapping of causal fields, as the paper notes that domain‑specific knowledge is necessary for higher performance.

text2SQL4PM

text2SQL4PM is a bilingual (Portuguese‑English) NL2SQL benchmark tailored for the process‑mining domain. It addresses domain‑specific challenges by including specialized terminology and a single‑table relational schema derived from event logs, comprising 1,655 natural‑language statements (with human paraphrases), 205 SQL queries, and 10 qualifiers.

Paper Intent

Process mining reconstructs and analyzes actual business‑process executions from system event logs (e.g., ERP, CRM, ticketing systems). The authors aim to combine NL2SQL with domain analysis to improve efficiency, noting that the prevalent XES log format is typically flattened into a non‑normalized table, making NL2SQL considerably harder in this context.

Dataset Analysis

The dataset was built through three stages:

Data collection: 29 undergraduates and 13 graduate students with SQL knowledge generated 237 initial statement pairs.

Dataset refinement: three process‑mining experts validated, performed semantic replacements, and added eight‑dimensional labels.

Dataset expansion: a native‑English translator produced English versions of the originally Portuguese statements, with data‑mining experts confirming semantic equivalence.

Summary

In evaluations, GPT‑3.5 Turbo attains only 30%–40% accuracy on both English and Portuguese queries, indicating substantial room for improvement in NL2SQL for process mining. The dataset’s bilingual nature, rich paraphrases, and expert‑validated annotations make it valuable not only for NL2SQL research but also for machine translation and paraphrase generation tasks.

References

REEF paper: https://arxiv.org/html/2508.21304

text2SQL4PM paper: https://arxiv.org/html/2509.09684

REEF dataset: https://github.com/ChaemyungLim/ORCA/tree/main/REEF

text2SQL4PM repository: https://github.com/pm-usp/text-2-sql

SQLLLMdatasetNL2SQLCausal Analysisprocess mining
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.