How RAG Transforms Natural Language Queries into Accurate SQL for Business Users

This article explains how Retrieval‑Augmented Generation (RAG) combines large language models with vector databases to let non‑technical staff query massive membership data using plain language, detailing the workflow, technical architecture, optimization challenges, and real‑world impact on data‑driven decision making.

Bilibili Tech
Bilibili Tech
Bilibili Tech
How RAG Transforms Natural Language Queries into Accurate SQL for Business Users

RAG Technology Overview

In the era of digital transformation, data has become a core asset for enterprises. Within Bilibili's Premium Membership Center, the data intelligence platform processes massive member data and provides insights for service optimization. Traditional SQL queries pose a barrier for non‑technical business users, prompting the adoption of Large Language Models (LLM) to enable natural‑language‑to‑SQL interactions.

Challenges of Traditional LLM‑Generated SQL

Although LLMs excel in natural language processing, direct SQL generation suffers from hallucination issues: incorrect field names, irrelevant table joins, or fabricated schema elements, leading to inaccurate or unexecutable queries.

RAG Workflow

Retrieval‑Augmented Generation (RAG) mitigates these problems by integrating a vector database that stores contextual knowledge such as data models, business rules, and historical query examples. When a user submits a natural‑language request, the system retrieves semantically similar context from the vector store and feeds it to the LLM, greatly improving SQL accuracy.

Document Preprocessing and Vector Store Construction

Unstructured Loader: Parses various file formats (.docx, .xlsx, .pdf) into plain text streams.

Data Slicing: Segments text into chunks based on paragraphs or token limits to preserve semantic completeness.

Embedding: Converts each chunk into high‑dimensional vectors using a pretrained model.

Vector Database: Persists embeddings and builds indexes for fast similarity search.

Question‑Answer Inference Stage

Question Embedding: Transforms the user's query into a vector.

Context Retrieval: Retrieves the most relevant chunks from the vector store.

Prompt Construction: Combines the retrieved context with a predefined template (Instruction + Context + Question) to guide the LLM.

LLM Reasoning: Generates the answer or SQL statement using in‑context learning.

This approach ensures the LLM generates SQL based on accurate, retrieved context, reducing hallucinations and improving correctness.

Technical Solution Implementation

The platform’s architecture is layered as follows:

Base Layer

Configuration Center: Manages query parsing rules, recall strategies, and LLM inference parameters.

Logging & Monitoring: Records queries, latency, results, and LLM failure rates.

Permission Management: Controls access to LLM services and database operations.

Data Layer

User‑side data: Profiles, historical queries, interaction behavior.

Business data: Core tables, knowledge bases, FAQs.

Metadata: Table schemas, indexes, field constraints for SQL validation.

Storage Layer

Business DB: Stores primary business data.

Cache: Accelerates frequent query results and LLM inference.

Vector Store: Holds vector indexes for semantic retrieval.

Service Layer

Query Parsing Service: NLP intent recognition and entity extraction.

Recall Service: Retrieves relevant knowledge from the vector store.

LLM Inference Service: Generates SQL using retrieved context.

SQL Optimization Service: Refines syntax, applies indexing, and improves execution plans.

Application Layer

End‑user Query Apps: Direct response via search or chatbot interfaces.

SDK Integration: Enables batch query parsing and report generation.

Intelligent Extensions: Derives insights such as intent analysis and demand mining.

Application Effects

Since deployment, query efficiency has dramatically improved. Business users now obtain accurate results within minutes instead of hours or days, reducing reliance on data analysts. Accuracy of retrieved answers exceeds 85% after manual verification.

Challenges and Future Outlook

Current bottlenecks include inference latency due to multi‑stage processing and large vector search overhead. Planned optimizations involve more efficient vector indexing, model distillation, and quantization. Testing workflows are being automated with AI agents to reduce manual review effort, while continuous performance enhancements aim to deliver faster, more reliable data services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AILLMRAGvector databaseData PlatformNL-to-SQL
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.