How to Build a Truly Usable AI‑Powered Natural Language Query System from Scratch
The article analyzes why natural‑language database queries often fail, outlines four technical routes, presents a five‑layer architecture with a business‑semantic middle layer, shares engineering best practices, a real‑world case study, and a product comparison to guide data companies in designing an effective intelligent query system.
Problem Context
A manufacturing group’s data platform team of 18 people handles over 200 daily data requests. Each request requires a separate SQL, report, and communication, taking from one day to a week. The bottleneck is the mismatch between business users speaking natural language and databases understanding only SQL.
Beyond Simple NL‑to‑SQL
Intelligent query must answer four layers:
What is being asked? Intent understanding and business concept mapping (instead of keyword matching).
Where is the data? Automatic schema awareness and routing (instead of manually specified tables).
How to compute? Semantic‑driven SQL/DSL generation (instead of hand‑written SQL).
Is the result correct? Automatic validation and explainable output (instead of manual verification).
Only handling the third layer leads to failures for queries like “three‑year gross‑margin trend” because the system lacks company‑specific metric definitions and table knowledge.
Technical Routes
Route 1 – Pre‑built Wide Table + NL2SQL (ByteDance‑style)
User query → NL2SQL → Single‑table query → ResultPros: Simple implementation, fast response, single‑table accuracy >90%.
Cons: Requires manual design and maintenance of the wide table; cannot cover all possible queries; new business scenarios need a rebuilt table.
Route 2 – ChatBI (FanRuan‑style)
Core idea: Add a natural‑language front‑end to an existing BI system.
Pros: Quick rollout, familiar UI, strong visualization.
Cons: Only answers pre‑defined questions; unexpected queries stall; AI is decorative rather than core.
Route 3 – Pre‑defined Metric Platform (JD.com‑style)
Core idea: Manually define business metric calculations and match user queries to those metrics.
Pros: Unified definitions, trustworthy data, compliance‑friendly.
Cons: Metric explosion, high maintenance cost, cannot answer undefined questions.
Route 4 – Business Ontology + Multi‑Agent (Palantir/UINO‑style)
User query → Intent‑clarification Agent
→ Knowledge Retrieval Agent
→ DSL/SQL Generation Agent
→ Result‑Validation Agent
→ Graph traversal query (no manual JOIN)
→ Return + ExplanationPros: Multi‑table accuracy ≥95%, strong generalization, supports arbitrary questions, knowledge accumulates over time.
Cons: High upfront modeling cost, requires a large model (e.g., 671 B parameters) and powerful GPUs; not suitable for SaaS.
Decision Matrix
Fixed query patterns, need fast launch → Wide table + NL2SQL
Existing BI system, lightweight upgrade → ChatBI
Mature metric system, compliance‑first → Pre‑defined metric platform
Complex multi‑table joins, high accuracy needed → Ontology + Multi‑Agent
Five‑Layer System Architecture (All Required)
Data foundation (lake‑warehouse)
Business‑semantic middle layer
Model integration layer
Execution & validation layer
Observability & operations layer
Business‑Semantic Middle Layer
Business term mapping: Map phrases like “big client” or “churned client” to specific fields and filters.
Metric definition: Clarify calculations such as gross‑margin = (Revenue‑DirectCost)/Revenue versus (Revenue‑FullCost)/Revenue.
Field lineage: Identify tables and joins that provide values like “regional manager achievement rate.”
Example M‑Schema snippet:
-- M‑Schema example
TABLE orders: order master table
- order_id: unique order identifier
- customer_id: links to customers.id
- amount: order amount (tax‑included, RMB)
- status: [1=Pending, 2=Shipped, 3=Completed, 4=Refunded]
- created_at: order time (UTC+8)
-- Business definitions
"Effective order" = status IN (2,3)
"Monthly GMV" = SUM(amount) WHERE status != 4 AND created_at >= first day of monthUsing an explicit schema raises model‑generated SQL accuracy from ~60% to >85%.
Three Often‑Overlooked Engineering Details
1. Multi‑turn Context Management
Maintain a dialogue‑state window that retains the last 3‑5 turns and resets when the topic changes, respecting LLM token limits.
2. SQL Safety Interception
Three defense layers:
Syntax layer – block DML/DDL; allow only SELECT.
Permission layer – field‑level access control and automatic masking of sensitive columns.
Semantic layer – detect anomalous result sizes or amounts (e.g., billions) and raise warnings.
Provide an explainable SQL output that restates the generated query in natural language for user confirmation before execution.
3. Accuracy Boosting Techniques
Few‑Shot Injection: Show the model example SQLs for similar questions. Effect: +10‑15% accuracy.
Self‑Consistency Voting: Generate 3‑5 SQL candidates, pick the majority vote. Effect: +8‑12% accuracy.
Self‑Correction: When SQL errors occur, let the model read the error and rewrite. Effect: +20% coverage.
Combined, these methods raise complex multi‑table query accuracy from ~65% to >85%.
Real‑World Case: Sanhua Smart Controls (2025)
Background: Four independent systems (U9C, MES, WMS, finance) with inconsistent metrics; average analysis time 5.8 days.
Solution: Lake‑warehouse base, rule engine, lightweight ML models, and a built business‑semantic layer.
Document return rate: 23.6% → 9.2% (‑61%)
Data processing cycle: 5.8 days → 3.1 days (‑46.6%)
Budget overrun return rate: 34% → 8% (‑76.5%)
Key insight: 80% of effort on data governance and semantic layer enabled the model to perform well.
2026 Product Landscape (IT Home, March 2026)
SmartBI BaiZe – Enterprise‑grade, claimed 99% accuracy, suited for large enterprises with strict compliance.
Volcano Engine Data Agent – Internet/standardized scenario, ecosystem‑focused, accuracy not disclosed.
Alibaba Cloud Quick BI – SMB entry‑level, low‑cost, accuracy not disclosed.
Shuishi SwiftAgent – Agent‑tech exploration, suited for tech‑driven teams, accuracy not disclosed.
Kyligence – Massive data processing, performance‑first back‑ends, accuracy not disclosed.
Self‑Develop vs Purchase Decision Framework
Highly custom business scenarios → build in‑house.
Tight timeline, sufficient budget, existing similar customers → buy.
Lack of technical depth, need quick PoC → buy with secondary development.
Four‑Step Implementation Methodology
Step 1 – Unified Data Foundation (1‑2 months)
Establish data governance: define core field definitions, lineage, permissions, and a unified metric metadata store. This delivers ~80% of project value.
Step 2 – Build Business‑Semantic Layer (1 month)
Encode core terms, metric definitions, table relationships, and enumerations into an M‑Schema; create a RAG vector store as a “business dictionary” for the LLM.
Step 3 – Connect Model and Run Core Flow (2‑4 weeks)
Integrate a large‑model API (e.g., Qwen 3.5‑Plus or DeepSeek V3) to achieve intent → schema routing → SQL generation → result explanation, and validate with at least 20 core business questions.
Step 4 – Engineering, Testing, and Rollout (2‑4 weeks)
Add permission checks, multi‑turn dialogue handling, caching, monitoring, and alerting; perform a gray‑scale launch, collect feedback, and iterate.
Core Insight
Failures are rarely due to weak LLMs; they stem from inadequate data governance. When the semantic layer is clear, field meanings are annotated, metric definitions are unified, and permissions are bounded, mainstream LLMs achieve >80% accuracy. Conversely, a schema of 200 opaque tables defeats even GPT‑4o.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Lao Guo's Learning Space
AI learning, discussion, and hands‑on practice with self‑reflection
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
