Artificial Intelligence 22 min read

Unlocking LangChain: A Deep Dive into LLM‑Powered Application Development

This article explains what LangChain is, outlines its core components such as Models, Indexes, Chains, Memory and Agents, provides practical code examples for building summarization and QA pipelines, and discusses future directions for LLM‑centric development.

JD Cloud Developers

Aug 15, 2023

Unlocking LangChain: A Deep Dive into LLM‑Powered Application Development

What is LangChain?

LangChain is a framework for building applications powered by large language models (LLMs). It can be thought of as the "Spring" of the LLM world or an open‑source ChatGPT plugin system. Its two core capabilities are connecting LLMs to external data sources and enabling tool‑based interaction through agents.

Core Components

Models

LangChain does not provide its own LLMs; instead it offers a unified interface to access any LLM, making it easy to swap underlying models or define custom ones. Two main model types are supported:

LLM: takes a text string as input and returns a text string (e.g., OpenAI's text‑davinci‑003).

Chat Model: takes a list of chat messages as input and returns chat messages (e.g., ChatGPT, Claude).

Interaction with models is usually done via prompts, and LangChain provides PromptTemplate to build and reuse prompts.

from langchain import PromptTemplate
prompt_template = '''作为一个资深编辑，请针对 >>> 和 <<< 中间的文本写一段摘要。 
>>> {text} <<<''' 
prompt = PromptTemplate(template=prompt_template, input_variables=["text"])
print(prompt.format_prompt(text="我爱北京天安门"))

Indexes

Indexes integrate external data sources to retrieve answers from them. The typical workflow includes:

Loading documents with Document Loaders.

Splitting text semantically with Text Splitters.

Storing vectors in a Vectorstore.

Retrieving relevant documents with a Retriever.

Document Loaders

Loaders convert external files into LangChain's standard Document type, which contains page_content and metadata (e.g., file path).

Text Splitters

Because LLMs have limited context windows (4k, 16k, etc.), large texts must be split. The common splitter is RecursiveCharacterTextSplitter, which uses a list of separators to iteratively divide text until each chunk fits the size limit.

from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    model_name="gpt-3.5-turbo",
    allowed_special="all",
    separators=["

", "
", "。", "，"],
    chunk_size=7000,
    chunk_overlap=0
)

docs = text_splitter.create_documents(["文本在这里"])
print(docs)

Vectorstore

Text embeddings convert documents into vectors for semantic search. Supported vectorstores include Faiss and Chroma. Embedding models such as OpenAIEmbeddings or HuggingFaceEmbeddings can be used; the latter allows loading a local model to reduce API costs.

# Load a local HuggingFace embedding model
embeddings = HuggingFaceEmbeddings(model_name="text2vec-base-chinese", cache_folder="本地模型地址")
embeddings = embeddings_model.embed_documents(["我爱北京天安门!", "Hello world!"])

Retriever

The Retriever interface fetches relevant documents from a vectorstore based on an unstructured query.

from langchain import FAISS
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = WebBaseLoader("https://in.m.jd.com/help/app/register_info.html")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    model_name="gpt-3.5-turbo",
    allowed_special="all",
    separators=["

", "
", "。", "，"],
    chunk_size=800,
    chunk_overlap=0
)

docs = text_splitter.split_documents(data)
embeddings = HuggingFaceEmbeddings(model_name="text2vec-base-chinese", cache_folder="models")
vectorstore = FAISS.from_documents(docs, embeddings)
result = vectorstore.as_retriever().get_relevant_documents("用户注册资格")
print(result)
print(len(result))

Chains

Chains link components together to simplify complex workflows. Main chain types include:

LLMChain – combines a PromptTemplate, an LLM, and an OutputParser.

SequentialChain – executes a series of chains in a predefined order.

RouterChain – dynamically selects the next chain based on input.

LLMChain Example

Uses a prompt to extract keywords and sentiment from a comment.

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.output_parsers import ResponseSchema, StructuredOutputParser

keyword_schema = ResponseSchema(name="keyword", description="评论的关键词列表")
emotion_schema = ResponseSchema(name="emotion", description="评论的情绪，正向为1，中性为0，负向为-1")
output_parser = StructuredOutputParser.from_response_schemas([keyword_schema, emotion_schema])
format_instructions = output_parser.get_format_instructions()

prompt_template_txt = '''
作为资深客服，请针对 >>> 和 <<< 中间的文本识别其中的关键词，以及包含的情绪是正向、负向还是中性。
>>> {text} <<<
RESPONSE:
{format_instructions}
''' 
prompt = PromptTemplate(template=prompt_template_txt, input_variables=["text"], partial_variables={"format_instructions": format_instructions})
llm_chain = LLMChain(prompt=prompt, llm=llm)
comment = "京东物流没的说，速度态度都是杠杠滴！这款路由器颜值贼高，怎么说呢，就是泰裤辣！..."
result = llm_chain.run(comment)
data = output_parser.parse(result)
print(f"type={type(data)}, keyword={data['keyword']}, emotion={data['emotion']}")

SequentialChain Example

First translates English text to Chinese, then summarizes it in one sentence.

from langchain.chains import LLMChain, SimpleSequentialChain
from langchain.prompts import PromptTemplate

first_prompt = PromptTemplate.from_template("翻译下面的内容到中文:

{content}")
chain_trans = LLMChain(llm=llm, prompt=first_prompt, output_key="content_zh")
second_prompt = PromptTemplate.from_template("一句话总结下面的内容:

{content_zh}")
chain_summary = LLMChain(llm=llm, prompt=second_prompt)
overall_simple_chain = SimpleSequentialChain(chains=[chain_trans, chain_summary], verbose=True)
content = "... (original English paragraph) ..."
result = overall_simple_chain.run(content)
print(f"result={result}")

RouterChain Example

Chooses between different sub‑chains using either a zero‑shot LLM router or an embedding‑based router.

from langchain.agents import initialize_agent, AgentType
agent = initialize_agent(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, tools=tools, llm=llm, verbose=True)
print(agent.agent.llm_chain.prompt.template)

Memory

Memory stores conversation history so that chains can maintain state across turns. Common memory types include:

ConversationSummaryMemory – stores a summary of the dialogue.

ConversationBufferWindowMemory – keeps the most recent N messages.

ConversationBufferMemory – retains the entire conversation.

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
conversation = ConversationChain(llm=llm, memory=memory, verbose=True)
print(conversation.prompt)
print(conversation.predict(input="我的姓名是tiger"))
print(conversation.predict(input="1+1=?"))
print(conversation.predict(input="我的姓名是什么"))

Agents

Agents act as autonomous executors that can call tools to overcome LLM limitations such as outdated knowledge or weak reasoning. Popular open‑source agents include AutoGPT, BabyAGI, and AgentGPT.

Core Components

Agent – decides the next action and invokes the LLM.

Tools – functions the agent can call; each tool has a description that guides the LLM.

Toolkits – collections of related tools (e.g., Office365, Gmail).

Agent Executor – runs the selected tool.

Agent Types

Agents are instantiated via initialize_agent with an AgentType such as ZERO_SHOT_REACT_DESCRIPTION, CHAT_CONVERSATIONAL_REACT_DESCRIPTION, etc.

Custom Tool Example

from datetime import date
from langchain.agents import tool

@tool
def time(text: str) -> str:
    """返回今天的日期。"""
    return str(date.today())

tools = load_tools(["llm-math"], llm=llm)
tools.append(time)
agent_math = initialize_agent(agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, tools=tools, llm=llm, verbose=True)
print(agent_math("计算45 * 54"))
print(agent_math("今天是哪天？"))

Practical Implementations

Document Summarization

Load remote documents, split them, and run a refine summarization chain.

from langchain.prompts import PromptTemplate
from langchain.document_loaders import PlaywrightURLLoader
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PlaywrightURLLoader(urls=["https://content.jr.jd.com/article/index.html?pageId=708258989"])
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    model_name="gpt-3.5-turbo",
    allowed_special="all",
    separators=["

", "
", "。", "，"],
    chunk_size=7000,
    chunk_overlap=0
)
prompt_template = """作为一个资深编辑，请针对 >>> 和 <<< 中间的文本写一段摘要。
>>> {text} <<<"""
refine_template = """作为一个资深编辑，基于已有的一段摘要：{existing_answer}，针对 >>> 和 <<< 中间的文本完善现有的摘要。
>>> {text} <<<"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
REFINE_PROMPT = PromptTemplate(template=refine_template, input_variables=["existing_answer", "text"])
chain = load_summarize_chain(llm, chain_type="refine", question_prompt=PROMPT, refine_prompt=REFINE_PROMPT, verbose=False)

docs = text_splitter.split_documents(data)
result = chain.run(docs)
print(result)

Retrieval‑Based QA

Load a web page, split it, store embeddings in FAISS, and answer questions with a custom prompt.

from langchain.chains import RetrievalQA
from langchain.document_loaders import WebBaseLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS

loader = WebBaseLoader("https://in.m.jd.com/help/app/register_info.html")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    model_name="gpt-3.5-turbo",
    allowed_special="all",
    separators=["

", "
", "。", "，"],
    chunk_size=800,
    chunk_overlap=0
)

docs = text_splitter.split_documents(data)
embeddings = HuggingFaceEmbeddings(model_name="text2vec-base-chinese", cache_folder="model")
vectorstore = FAISS.from_documents(docs, embeddings)

template = """请使用下面提供的背景信息来回答最后的问题。 如果你不知道答案，请直接说不知道，不要试图凭空编造答案。
回答时最多使用三个句子，保持回答尽可能简洁。 回答结束时，请一定要说\"谢谢你的提问！\"
{context}
问题: {question}
有用的回答:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"], template=template)
qa_chain = RetrievalQA.from_chain_type(llm, retriever=vectorstore.as_retriever(), return_source_documents=True, chain_type_kwargs={"prompt": QA_CHAIN_PROMPT})
result = qa_chain({"query": "用户注册资格"})
print(result["result"])
print(len(result["source_documents"]))

Future Directions

As LLMs evolve rapidly, LangChain must keep pace with frequent releases and a growing contributor community (over 1,200 contributors). Two promising avenues are low‑code visual orchestration tools like LangFlow, and more powerful agents that could become the "SQL of LLMs," dramatically expanding application scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM LangChain memory Agents Vector Store

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.