From Natural Language to Executable SQL: Building an AI‑Powered SQL Generation Engine

The article explains why directly letting large language models generate SQL leads to poor accuracy, and presents a production‑grade engine that combines a semantic knowledge layer, RAG‑enhanced NL‑to‑DSL conversion, and a deterministic DSL‑to‑SQL translator to achieve 85‑90% correctness in real‑world deployments.

Architect's Ambition
Architect's Ambition
Architect's Ambition
From Natural Language to Executable SQL: Building an AI‑Powered SQL Generation Engine

Why Direct LLM‑Generated SQL Fails

In production, feeding user questions straight to a large model to output SQL (Text2SQL) drops accuracy below 50% because the generated SQL strings cannot be structurally validated, optimized, adapted to different dialects, or debugged.

Proposed Engine Architecture

The solution introduces a three‑layer architecture: a semantic layer that stores structured business knowledge, an NL2DSL layer that uses Retrieval‑Augmented Generation (RAG) with a large model to produce a JSON‑based domain‑specific language (DSL), and a DSL2SQL layer that deterministically translates DSL into executable SQL.

Semantic Layer – Structured Business Knowledge

Business concepts are maintained jointly by data engineers and analysts. Continuous updates are handled through:

Automated extraction : pull field names, Chinese labels, and enum values from data dictionaries, metadata platforms, and existing SQL comments.

Manual supplementation : annotate synonyms and calculation logic for key metrics such as “sales amount”.

Version control : store the semantic configuration in Git to enable diff and rollback.

Core components include:

Topic model : predefined fact and dimension tables per business domain (sales, inventory, finance).

Metric/Dimension dictionary : each entry records field name, Chinese name, synonyms, data type, unit, and business definition (e.g., sales_amount → “销售额”, synonyms “营收|GMV”, unit “万元”, scope “不含退款”).

Vector semantic index : vectorize field comments, synonyms, and descriptions for similarity‑based retrieval.

Business knowledge base : store long‑text rules and typical query templates for few‑shot prompting.

Practical tip: start by covering the high‑frequency 20% of fields that serve 80% of queries, then iterate.

NL2DSL – From Natural Language to Structured DSL

The NL2DSL process consists of four steps:

Context retrieval : fetch relevant entries from the semantic layer based on the user question.

Prompt construction : inject the retrieved context into a carefully crafted prompt that enforces JSON output.

LLM generation : call a large model (e.g., Doubao, DeepSeek, GPT‑4) to produce the DSL. Rich context and strict format keep the model from “hallucinating”.

Post‑processing & validation : parse the generated text as JSON and validate against the DSL schema; on failure, retry once with altered examples or lower temperature.

A production‑grade DSL example:

{
  "version": "1.0",
  "query_type": "compare",
  "dataset": {
    "type": "join",
    "relation": [
      {"name": "sales_fact", "alias": "s", "type": "fact"},
      {"name": "product_dim", "alias": "p", "type": "dim", "join": {"type": "inner", "left": "s.product_id", "right": "p.id"}},
      {"name": "region_dim", "alias": "r", "type": "dim", "join": {"type": "inner", "left": "s.region_id", "right": "r.id"}}
    ]
  },
  "select": [
    {"expr": "p.category", "alias": "品类"},
    {"expr": "SUM(CASE WHEN s.year=2025 THEN s.amount ELSE 0 END)", "alias": "销售额_2025"},
    {"expr": "(SUM(...) - SUM(...)) / NULLIF(SUM(...), 0) * 100", "alias": "增长率(%)"}
  ],
  "filter": {
    "operator": "and",
    "conditions": [
      {"field": "s.quarter", "operator": "in", "value": [1]},
      {"field": "r.region_name", "operator": "in", "value": ["华东", "华南"]}
    ]
  },
  "group_by": ["p.category"],
  "order_by": [{"expr": "增长率(%)", "direction": "desc", "nulls": "last"}],
  "limit": 10
}

Ambiguity Detection and Clarification

Before invoking the model, the system checks the user query for three common ambiguities:

Metric ambiguity : e.g., does “sales amount” refer to order total or net receipt?

Time ambiguity : e.g., does “this month” mean calendar month or business month?

Dimension ambiguity : e.g., does “North Region” refer to sales territory or logistics area?

The question plus potentially conflicting field descriptions are fed back to the model, which decides whether clarification is needed. If so, an agent asks the user, adding one interaction but substantially improving accuracy.

DSL Validation and Security Layer

Because DSL is structured JSON, it can be rigorously checked:

Schema validation : ensure required fields exist and types match.

Field existence : compare field names against the semantic layer.

Operator‑type compatibility : prevent using LIKE on date fields.

Numeric aggregation check : aggregation fields must be numeric.

Security measures inject row‑level permission conditions into the filter clause based on the user’s role (e.g., dept_id = current_user.dept_id). If the original query already contains the same field, the condition is added with AND rather than overwriting, avoiding privilege bypass.

Logical optimizations remove redundant filters (e.g., 1=1) and cap limit values (e.g., max 10 000) to prevent runaway queries.

DSL2SQL – Deterministic Translation

The DSL‑to‑SQL step is rule‑driven, guaranteeing 100% correctness. Each database dialect (MySQL, DM, Oracle) has its own translator that walks the DSL JSON and concatenates SQL fragments.

select → SELECT … dataset.relation → FROM … INNER JOIN … (supports arbitrary multi‑table joins)

filter → WHERE … group_by → GROUP BY … having → HAVING … order_by → ORDER BY … NULLS LAST/FIRST limit → LIMIT n (or dialect‑specific equivalents such as ROWNUM <= n for DM)

Because the same DSL can be fed to different translators, switching from MySQL to Oracle requires no changes in upstream logic.

Feedback Loop for Continuous Improvement

Accuracy is further boosted by a closed‑loop feedback mechanism:

Correction entry : users can flag wrong results and submit the correct SQL or DSL.

Audit : automatic execution checks plus sampled manual review feed corrected examples back into the few‑shot library.

Metric : monthly sampling measures DSL generation correctness (semantic validity and executability) and guides optimization.

Through iterative refinement, production accuracy climbs from sub‑50% to 85‑90% for complex business queries, and exceeds 95% for single‑table scenarios.

Conclusion

The presented engine—semantic layer + NL2DSL + DSL2SQL—delivers controllable, accurate, secure, and dialect‑agnostic SQL generation. Starting with a wide table and a modest few‑shot set, then progressively adding the semantic layer and DSL, provides a pragmatic path for teams building intelligent query systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

RAGsemantic layerlarge language modelSQL generationDSL2SQLNL2DSL
Architect's Ambition
Written by

Architect's Ambition

Observations, practice, and musings of an architect. Here we discuss technical implementations and career development; dissect complex systems and build cognitive frameworks. Ambitious yet grounded. Changing the world with code, connecting like‑minded readers with words.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.