How to Build a Truly Usable AI‑Powered Natural Language Query System from Scratch

The article analyzes why natural‑language database queries often fail, outlines four technical routes, presents a five‑layer architecture with a business‑semantic middle layer, shares engineering best practices, a real‑world case study, and a product comparison to guide data companies in designing an effective intelligent query system.

Lao Guo's Learning Space
Lao Guo's Learning Space
Lao Guo's Learning Space
How to Build a Truly Usable AI‑Powered Natural Language Query System from Scratch

Problem Context

A manufacturing group’s data platform team of 18 people handles over 200 daily data requests. Each request requires a separate SQL, report, and communication, taking from one day to a week. The bottleneck is the mismatch between business users speaking natural language and databases understanding only SQL.

Beyond Simple NL‑to‑SQL

Intelligent query must answer four layers:

What is being asked? Intent understanding and business concept mapping (instead of keyword matching).

Where is the data? Automatic schema awareness and routing (instead of manually specified tables).

How to compute? Semantic‑driven SQL/DSL generation (instead of hand‑written SQL).

Is the result correct? Automatic validation and explainable output (instead of manual verification).

Only handling the third layer leads to failures for queries like “three‑year gross‑margin trend” because the system lacks company‑specific metric definitions and table knowledge.

Technical Routes

Route 1 – Pre‑built Wide Table + NL2SQL (ByteDance‑style)

User query → NL2SQL → Single‑table query → Result

Pros: Simple implementation, fast response, single‑table accuracy >90%.

Cons: Requires manual design and maintenance of the wide table; cannot cover all possible queries; new business scenarios need a rebuilt table.

Route 2 – ChatBI (FanRuan‑style)

Core idea: Add a natural‑language front‑end to an existing BI system.

Pros: Quick rollout, familiar UI, strong visualization.

Cons: Only answers pre‑defined questions; unexpected queries stall; AI is decorative rather than core.

Route 3 – Pre‑defined Metric Platform (JD.com‑style)

Core idea: Manually define business metric calculations and match user queries to those metrics.

Pros: Unified definitions, trustworthy data, compliance‑friendly.

Cons: Metric explosion, high maintenance cost, cannot answer undefined questions.

Route 4 – Business Ontology + Multi‑Agent (Palantir/UINO‑style)

User query → Intent‑clarification Agent
   → Knowledge Retrieval Agent
   → DSL/SQL Generation Agent
   → Result‑Validation Agent
   → Graph traversal query (no manual JOIN)
   → Return + Explanation

Pros: Multi‑table accuracy ≥95%, strong generalization, supports arbitrary questions, knowledge accumulates over time.

Cons: High upfront modeling cost, requires a large model (e.g., 671 B parameters) and powerful GPUs; not suitable for SaaS.

Decision Matrix

Fixed query patterns, need fast launch → Wide table + NL2SQL

Existing BI system, lightweight upgrade → ChatBI

Mature metric system, compliance‑first → Pre‑defined metric platform

Complex multi‑table joins, high accuracy needed → Ontology + Multi‑Agent

Five‑Layer System Architecture (All Required)

Data foundation (lake‑warehouse)

Business‑semantic middle layer

Model integration layer

Execution & validation layer

Observability & operations layer

System Architecture
System Architecture

Business‑Semantic Middle Layer

Business term mapping: Map phrases like “big client” or “churned client” to specific fields and filters.

Metric definition: Clarify calculations such as gross‑margin = (Revenue‑DirectCost)/Revenue versus (Revenue‑FullCost)/Revenue.

Field lineage: Identify tables and joins that provide values like “regional manager achievement rate.”

Example M‑Schema snippet:

-- M‑Schema example
TABLE orders: order master table
  - order_id: unique order identifier
  - customer_id: links to customers.id
  - amount: order amount (tax‑included, RMB)
  - status: [1=Pending, 2=Shipped, 3=Completed, 4=Refunded]
  - created_at: order time (UTC+8)

-- Business definitions
"Effective order" = status IN (2,3)
"Monthly GMV" = SUM(amount) WHERE status != 4 AND created_at >= first day of month

Using an explicit schema raises model‑generated SQL accuracy from ~60% to >85%.

Three Often‑Overlooked Engineering Details

1. Multi‑turn Context Management

Maintain a dialogue‑state window that retains the last 3‑5 turns and resets when the topic changes, respecting LLM token limits.

2. SQL Safety Interception

Three defense layers:

Syntax layer – block DML/DDL; allow only SELECT.

Permission layer – field‑level access control and automatic masking of sensitive columns.

Semantic layer – detect anomalous result sizes or amounts (e.g., billions) and raise warnings.

Provide an explainable SQL output that restates the generated query in natural language for user confirmation before execution.

3. Accuracy Boosting Techniques

Few‑Shot Injection: Show the model example SQLs for similar questions. Effect: +10‑15% accuracy.

Self‑Consistency Voting: Generate 3‑5 SQL candidates, pick the majority vote. Effect: +8‑12% accuracy.

Self‑Correction: When SQL errors occur, let the model read the error and rewrite. Effect: +20% coverage.

Combined, these methods raise complex multi‑table query accuracy from ~65% to >85%.

Real‑World Case: Sanhua Smart Controls (2025)

Background: Four independent systems (U9C, MES, WMS, finance) with inconsistent metrics; average analysis time 5.8 days.

Solution: Lake‑warehouse base, rule engine, lightweight ML models, and a built business‑semantic layer.

Document return rate: 23.6% → 9.2% (‑61%)

Data processing cycle: 5.8 days → 3.1 days (‑46.6%)

Budget overrun return rate: 34% → 8% (‑76.5%)

Key insight: 80% of effort on data governance and semantic layer enabled the model to perform well.

2026 Product Landscape (IT Home, March 2026)

SmartBI BaiZe – Enterprise‑grade, claimed 99% accuracy, suited for large enterprises with strict compliance.

Volcano Engine Data Agent – Internet/standardized scenario, ecosystem‑focused, accuracy not disclosed.

Alibaba Cloud Quick BI – SMB entry‑level, low‑cost, accuracy not disclosed.

Shuishi SwiftAgent – Agent‑tech exploration, suited for tech‑driven teams, accuracy not disclosed.

Kyligence – Massive data processing, performance‑first back‑ends, accuracy not disclosed.

Self‑Develop vs Purchase Decision Framework

Highly custom business scenarios → build in‑house.

Tight timeline, sufficient budget, existing similar customers → buy.

Lack of technical depth, need quick PoC → buy with secondary development.

Four‑Step Implementation Methodology

Step 1 – Unified Data Foundation (1‑2 months)

Establish data governance: define core field definitions, lineage, permissions, and a unified metric metadata store. This delivers ~80% of project value.

Step 2 – Build Business‑Semantic Layer (1 month)

Encode core terms, metric definitions, table relationships, and enumerations into an M‑Schema; create a RAG vector store as a “business dictionary” for the LLM.

Step 3 – Connect Model and Run Core Flow (2‑4 weeks)

Integrate a large‑model API (e.g., Qwen 3.5‑Plus or DeepSeek V3) to achieve intent → schema routing → SQL generation → result explanation, and validate with at least 20 core business questions.

Step 4 – Engineering, Testing, and Rollout (2‑4 weeks)

Add permission checks, multi‑turn dialogue handling, caching, monitoring, and alerting; perform a gray‑scale launch, collect feedback, and iterate.

Core Insight

Failures are rarely due to weak LLMs; they stem from inadequate data governance. When the semantic layer is clear, field meanings are annotated, metric definitions are unified, and permissions are bounded, mainstream LLMs achieve >80% accuracy. Conversely, a schema of 200 opaque tables defeats even GPT‑4o.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AISemantic LayerLarge Language Modeldata governanceNL2SQL
Lao Guo's Learning Space
Written by

Lao Guo's Learning Space

AI learning, discussion, and hands‑on practice with self‑reflection

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.