Build a Text‑to‑SQL App with LangChain and OpenAI: Step‑by‑Step Guide

This article explains how to build a Text‑to‑SQL application using LangChain, OpenAI LLMs, and SQLDatabaseChain, covering the fundamentals of Text2SQL, LangChain components, code examples, and a practical SQLite case that transforms natural‑language questions into executable SQL queries.

Data Thinking Notes
Data Thinking Notes
Data Thinking Notes
Build a Text‑to‑SQL App with LangChain and OpenAI: Step‑by‑Step Guide

1. Introduction

Text2SQL is a hot application of large language models (LLMs) that converts natural language queries into SQL statements, lowering the barrier for data access and supporting decision‑making. This article introduces four aspects of using LLMs for Text2SQL.

2. Text2SQL Overview

Text‑to‑SQL (also NL2SQL) transforms natural language questions into executable SQL for relational databases. Example input: “查询表t_user的相关信息,结果按id降序排序,只保留前10个数据”. Example output: “SELECT * FROM t_user ORDER BY id DESC LIMIT 10”. The goal is to let ordinary users query databases without writing SQL.

3. LangChain Basics

LangChain is a framework for building LLM‑powered applications, connecting external data sources, tools, and models. Its core components include Models, Indexes, Chains, Memory, Prompt, and Agent.

3.1 Models

LangChain provides generic interfaces to LLMs (e.g., OpenAI text‑davinci‑003) and chat models (e.g., ChatGPT, Claude). Prompts are built with PromptTemplate.

from langchain import PromptTemplate
# Define prompt template
prompt = PromptTemplate(
    input_variables=["question"],
    template="""简洁和专业的来回答用户的问题。
如果无法从中得到答案,请说 “根据已知信息无法回答该问题” 或 “没有提供足够的相关信息”,不允许在答案中添加编造成分,答案请使用中文。
问题是:{question}"""
)
print(prompt.format_prompt(question="如何进行数据治理"))

3.2 Indexes

Indexes integrate external data sources, using Document Loaders, Text Splitters, Vectorstores, and Retrievers to fetch relevant information.

3.3 Chains

Chains link components to simplify complex workflows. Key chains include LLMChain, SQLDatabaseChain, and SequentialChain.

3.3.1 LLMChain

Combines a PromptTemplate, an LLM, and an OutputParser to produce structured outputs.

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "Your openai key"
llm = OpenAI(temperature=0)
prompt = PromptTemplate(
    input_variables=["question"],
    template="""简洁和专业的来回答用户的问题。
如果无法从中得到答案,请说 “根据已知信息无法回答该问题” 或 “没有提供足够的相关信息”,不允许在答案中添加编造成分,答案请使用中文。
问题是:{question}"""
)
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
print(chain.run("如何开展数据治理"))

3.3.2 SQLDatabaseChain

Automatically generates SQL from a natural‑language question, executes it via SQLAlchemy, and returns the answer.

3.3.3 SequentialChain

Executes a series of chains in a predefined order, allowing each step’s output to feed the next.

3.4 Memory

Memory components store conversation history, enabling stateful interactions across multiple turns.

3.5 Agent

Agents act as proxies for LLMs, invoking external tools (e.g., AutoGPT, BabyAGI) to overcome knowledge gaps and perform complex tasks.

4. Text2SQL Practice with SQLDatabaseChain

4.1 Overview

LangChain’s SQLDatabaseChain uses LLMs to translate natural‑language queries into SQL, runs them against a database (via SQLAlchemy), and formats the final answer.

4.2 Data Preparation

The example uses SQLite with the Chinook sample database, which contains tables such as Customer, Employee, Track, and Invoice.

4.3 Implementation Steps

Test Chinese query “总共有多少员工?” (English: “How many employees are there?”). The code below creates the database connection, initializes the LLM, and runs the query.

from langchain.llms import OpenAI
from langchain.utilities import SQLDatabase
from langchain_experimental.sql import SQLDatabaseChain
import os
os.environ["OPENAI_API_KEY"] = "Your openai key"
db = SQLDatabase.from_uri("sqlite:///.../Chinook.db")
llm = OpenAI(temperature=0, verbose=True)
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)
db_chain.run("总共有多少员工?")

The LLM translates the question to SELECT COUNT(*) FROM Employee, executes it, and returns the count in natural language.

Note: For data‑sensitive projects, set return_direct=True when initializing SQLDatabaseChain to return only the raw SQL output.

English queries produce the same result.

5. Future Plans

As LLMs evolve, LangChain remains a leading framework for integrating models with external data sources. While the basic Text2SQL workflow works, more complex queries may suffer from hallucinations or security concerns. Ongoing work will focus on improving stability, reliability, and safety.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonSQLLLMLangChainSQLiteText2SQL
Data Thinking Notes
Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.