Boosting Text-to-SQL Accuracy: J‑Schema, Iterative DPO, and Self‑Consistency
This article presents a comprehensive study on improving Text-to-SQL performance by introducing J‑Schema for structured schema representation, applying iterative Direct Preference Optimization (DPO) training, and leveraging self‑consistency voting mechanisms, achieving up to a 12% accuracy gain on the BIRD benchmark.
Technical background: Text2SQL converts natural language queries to SQL, evolving through rule‑based, neural, pretrained language model, and large language model stages. Current challenges are prompt optimization, model training, and inference robustness, investigated on the BIRD dataset.
Text2SQL Challenges
Text‑to‑SQL (NL2SQL) aims to generate executable SQL from natural language, enabling non‑expert users to query complex databases. The field has progressed through four stages: rule‑based, neural network, pretrained language models, and large language models, each addressing increasing complexity.
Three major difficulties remain: prompt optimization (designing prompts and schema presentation), model training (enhancing base capabilities), and inference enhancement (stabilizing LLM outputs).
Prompt & J‑Schema
We propose J‑Schema, a fully structured representation of database schema using special markers such as #DB_ID, #Table, and #Foreign keys. For each table we provide name, column information via basic_info, and example values, with rules to limit examples for date, float, integer, and text types.
Training Method: Iterative DPO
Iterative Direct Preference Optimization (DPO) repeatedly samples chain‑of‑thought reasoning steps and final answers, builds positive and negative example pools, forms preference pairs, and fine‑tunes the model. Multiple iterations increase execution accuracy, peaking at the third stage.
Execution accuracy on the BIRD benchmark improves from 63.69% (baseline) to 67.60% after the third iterative DPO stage.
Hyperparameter Scan
We vary the DPO loss weight β from 0.1 to 0.6, training two epochs per setting. The highest execution accuracy (≈68%) is achieved at β = 0.5.
Self‑Consistency
Self‑consistency generates multiple candidate SQL answers per query and selects the best via hard or soft voting. Soft voting, which considers answer similarity, consistently outperforms hard voting, yielding over 1% absolute accuracy gains.
For the iterative stage‑3 model, execution accuracy rises from 67.60% (no self‑consistency) to 68.97% with soft voting.
Future Directions
We plan to construct higher‑quality data from the million‑scale SynSQL‑2.5M dataset for BIRD, explore alternative training methods such as GRPO, and evaluate on additional benchmarks like Spider, ScienceBenchmark, and EHRSQL.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
