Build a Text‑to‑SQL App with LangChain and OpenAI: Step‑by‑Step Guide
This article explains how to build a Text‑to‑SQL application using LangChain, OpenAI LLMs, and SQLDatabaseChain, covering the fundamentals of Text2SQL, LangChain components, code examples, and a practical SQLite case that transforms natural‑language questions into executable SQL queries.
1. Introduction
Text2SQL is a hot application of large language models (LLMs) that converts natural language queries into SQL statements, lowering the barrier for data access and supporting decision‑making. This article introduces four aspects of using LLMs for Text2SQL.
2. Text2SQL Overview
Text‑to‑SQL (also NL2SQL) transforms natural language questions into executable SQL for relational databases. Example input: “查询表t_user的相关信息,结果按id降序排序,只保留前10个数据”. Example output: “SELECT * FROM t_user ORDER BY id DESC LIMIT 10”. The goal is to let ordinary users query databases without writing SQL.
3. LangChain Basics
LangChain is a framework for building LLM‑powered applications, connecting external data sources, tools, and models. Its core components include Models, Indexes, Chains, Memory, Prompt, and Agent.
3.1 Models
LangChain provides generic interfaces to LLMs (e.g., OpenAI text‑davinci‑003) and chat models (e.g., ChatGPT, Claude). Prompts are built with PromptTemplate.
from langchain import PromptTemplate
# Define prompt template
prompt = PromptTemplate(
input_variables=["question"],
template="""简洁和专业的来回答用户的问题。
如果无法从中得到答案,请说 “根据已知信息无法回答该问题” 或 “没有提供足够的相关信息”,不允许在答案中添加编造成分,答案请使用中文。
问题是:{question}"""
)
print(prompt.format_prompt(question="如何进行数据治理"))3.2 Indexes
Indexes integrate external data sources, using Document Loaders, Text Splitters, Vectorstores, and Retrievers to fetch relevant information.
3.3 Chains
Chains link components to simplify complex workflows. Key chains include LLMChain, SQLDatabaseChain, and SequentialChain.
3.3.1 LLMChain
Combines a PromptTemplate, an LLM, and an OutputParser to produce structured outputs.
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "Your openai key"
llm = OpenAI(temperature=0)
prompt = PromptTemplate(
input_variables=["question"],
template="""简洁和专业的来回答用户的问题。
如果无法从中得到答案,请说 “根据已知信息无法回答该问题” 或 “没有提供足够的相关信息”,不允许在答案中添加编造成分,答案请使用中文。
问题是:{question}"""
)
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
print(chain.run("如何开展数据治理"))3.3.2 SQLDatabaseChain
Automatically generates SQL from a natural‑language question, executes it via SQLAlchemy, and returns the answer.
3.3.3 SequentialChain
Executes a series of chains in a predefined order, allowing each step’s output to feed the next.
3.4 Memory
Memory components store conversation history, enabling stateful interactions across multiple turns.
3.5 Agent
Agents act as proxies for LLMs, invoking external tools (e.g., AutoGPT, BabyAGI) to overcome knowledge gaps and perform complex tasks.
4. Text2SQL Practice with SQLDatabaseChain
4.1 Overview
LangChain’s SQLDatabaseChain uses LLMs to translate natural‑language queries into SQL, runs them against a database (via SQLAlchemy), and formats the final answer.
4.2 Data Preparation
The example uses SQLite with the Chinook sample database, which contains tables such as Customer, Employee, Track, and Invoice.
4.3 Implementation Steps
Test Chinese query “总共有多少员工?” (English: “How many employees are there?”). The code below creates the database connection, initializes the LLM, and runs the query.
from langchain.llms import OpenAI
from langchain.utilities import SQLDatabase
from langchain_experimental.sql import SQLDatabaseChain
import os
os.environ["OPENAI_API_KEY"] = "Your openai key"
db = SQLDatabase.from_uri("sqlite:///.../Chinook.db")
llm = OpenAI(temperature=0, verbose=True)
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)
db_chain.run("总共有多少员工?")The LLM translates the question to SELECT COUNT(*) FROM Employee, executes it, and returns the count in natural language.
Note: For data‑sensitive projects, set return_direct=True when initializing SQLDatabaseChain to return only the raw SQL output.
English queries produce the same result.
5. Future Plans
As LLMs evolve, LangChain remains a leading framework for integrating models with external data sources. While the basic Text2SQL workflow works, more complex queries may suffer from hallucinations or security concerns. Ongoing work will focus on improving stability, reliability, and safety.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Thinking Notes
Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
