Boosting Text-to-SQL Accuracy with J‑Schema, Iterative DPO, and Self‑Consistency

This article examines the evolution of Text-to-SQL, introduces the J‑Schema representation and chain-of-thought prompting, applies iterative DPO training and self-consistency voting, and demonstrates how these techniques raise execution accuracy on the BIRD benchmark from 56.6% to 69.2%.

JD Tech
JD Tech
JD Tech
Boosting Text-to-SQL Accuracy with J‑Schema, Iterative DPO, and Self‑Consistency

Introduction

Text-to-SQL (NL2SQL) converts natural-language questions into executable SQL queries over a relational database, enabling non-technical users to retrieve information from large datasets.

Evolution of Text-to-SQL

The field has progressed through four stages:

Rule-based stage: Early systems relied on handcrafted rules and statistical language models, handling only single-table queries.

Neural-network stage: Sequence and graph neural models improved synonym handling and multi-table reasoning but were limited by model size and data.

Pre-trained language model stage: Models such as BERT and T5 brought substantial gains in language understanding.

Large-language-model stage: LLMs now dominate, with research focusing on prompt engineering and fine-tuning.

Current Challenges

Three major difficulties remain:

Prompt optimization: How to guide LLMs to produce clear reasoning steps and design database schemas that are easy for the model to understand.

Model training: How to improve the base capabilities of the model through training.

Inference enhancement: How to make the model’s output more stable and reliable.

J-Schema and Prompt Design

We propose J-Schema , a fully structured representation of database schemas that uses special markers such as #DB_ID, #Table, and #Foreign keys. Each table includes its name, and each column is described with basic_info containing the column name, description, primary-key flag, and example values.

To balance the number of example values with context length, we apply the following rules:

For date/time types (DATE, TIME, DATETIME, TIMESTAMP) keep a single example.

For floating-point type (REAL) keep two decimal places as one example.

For integer type (INTEGER) keep one example.

For text type (TEXT) keep multiple examples if the token length is less than 10.

Chain-of-Thought Prompting

We embed the complete database information, the user query, and external knowledge in the prompt, then ask the model to generate intermediate reasoning steps inside <think>…</think> tags and the final SQL answer inside <answer>…</answer> tags.

Iterative DPO Training

We adopt an iterative Direct Preference Optimization (DPO) procedure. In each iteration we sample multiple chain-of-thought reasoning traces and final answers, construct positive and negative preference pairs, and fine-tune the model. Repeating this process yields progressively higher execution accuracy, saturating after the third iteration.

Hyper-parameter scanning shows that a DPO loss weight (beta) of 0.5 gives the best execution accuracy.

Self-Consistency Voting

We generate multiple candidate SQL answers for each question and select the best one using voting. Two strategies are explored:

Hard voting: Binary correctness decides the vote.

Soft voting: Votes are weighted by similarity between candidates.

Soft voting consistently outperforms hard voting, providing over 1% absolute improvement in execution accuracy across checkpoints.

Future Directions

We plan to:

Construct higher-quality data by filtering the 2.5 M-sample SynSQL-2.5M dataset for relevance to BIRD.

Explore alternative training methods such as GRPO, which uses group-wise relative rewards without explicit preference pairs.

Validate the proposed optimizations on additional benchmarks like Spider, ScienceBenchmark, and EHRSQL, and eventually apply them to real-world DataAgent scenarios.

LLMText-to-SQLSelf-ConsistencyBIRD BenchmarkIterative DPOJ-Schema
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.