Boosting Text‑to‑SQL Accuracy with Prompt Engineering and LLMs
This article examines the challenges of LLM‑based Text‑to‑SQL such as hallucinations, data‑security risks, and user input errors, and presents prompt‑engineering strategies, fine‑tuning comparisons, prompt types, code examples, and experimental results to improve reliability and cost‑effectiveness.
Introduction
In the previous article we demonstrated a Text‑to‑SQL pipeline based on SQLDatabaseChain, but complex queries suffered from hallucinations, data‑security risks and fragile handling of user errors. This article explores four ways to improve LLM‑based Text‑to‑SQL through prompt engineering.
1. Problems and Ideas
1.1 Challenges
Hallucination: the model may generate syntactically correct but unrelated or wrong SQL.
Data security: generated statements could leak or tamper with sensitive data.
User input errors: misspellings or illegal operations lead to unsafe SQL.
1.2 Solution ideas
We can address the issues by Prompt Engineering —designing prompts that embed SQL syntax guidance—or by Fine‑Tuning the model on a large SQL corpus.
Prompt Engineering is low‑cost and does not require retraining, while Fine‑Tuning offers higher performance at the expense of compute and time.
2. Prompt Overview
2.1 Concept
A prompt is a crafted instruction that steers the LLM to produce the desired output, e.g., adding SQL syntax hints.
2.2 Prompt components
Instruction – what the model should do.
Context (optional) – additional knowledge, possibly retrieved from a vector store.
Input Data (optional) – the user query.
Output Indicator (optional) – a token that marks the start of the answer.
2.3 Prompt types
Zero‑Shot Prompting – ask the model directly without examples.
Few‑Shot Prompting – provide a few examples to guide the model.
Chain of Thought (CoT) – ask the model to reason step‑by‑step.
Self‑Consistency – generate multiple reasoning paths and vote.
Tree of Thoughts (ToT) – explore a tree of possible actions.
ReAct – interleave reasoning and tool‑use actions.
3. Prompt Practices for Text‑to‑SQL
3.1 Existing issues
Using the Chinook database, the earlier pipeline failed to retrieve customers who placed orders in two consecutive months.
3.2 Solutions
Switching from GPT‑3.5‑Turbo to GPT‑4 dramatically improved correctness, but at roughly ten times the cost.
Alternatively, enrich the prompt with schema information, few‑shot examples, and proper instruction.
from langchain.prompts import PromptTemplate
TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Use the following format:
Question: \"...\"
SQLQuery: \"...\"
SQLResult: \"...\"
Answer: \"...\"
Only use the following tables:
{table_info}
Some examples of SQL queries that correspond to questions are:
{few_shot_examples}
Question: {input}"""Example table schema and sample queries are shown below.
CREATE TABLE "Track" (
"TrackId" INTEGER NOT NULL,
"Name" NVARCHAR(200) NOT NULL,
"AlbumId" INTEGER,
"MediaTypeId" INTEGER NOT NULL,
"GenreId" INTEGER,
"Composer" NVARCHAR(220),
"Milliseconds" INTEGER NOT NULL,
"Bytes" INTEGER,
"UnitPrice" NUMERIC(10, 2) NOT NULL,
PRIMARY KEY ("TrackId")
);
SELECT * FROM 'Track' LIMIT 3;Full test code using LangChain builds the prompt, binds the model, and invokes the chain with a question such as “Which customers placed orders in two consecutive months?”. The generated SQL satisfies the requirement, and the token cost of the extra prompt context is about 10 % of the GPT‑4 price.
4. Future Plans
The article concludes that few‑shot prompting is an effective way to improve Text‑to‑SQL results, and suggests exploring agents, CoT, and other techniques in future work.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
