Artificial Intelligence 14 min read

Build a Soul‑Healing Chatbot with LangChain & Llama 2: A Step‑by‑Step Guide

This article walks through constructing a domain‑specific, soul‑healing chatbot using LangChain and Llama 2, comparing fine‑tuning versus external knowledge bases, detailing environment setup, data loading, text splitting, embedding with a Chinese model, vector store creation, prompt engineering, inference, and optimization strategies.

UCloud Tech

Sep 11, 2023

Build a Soul‑Healing Chatbot with LangChain & Llama 2: A Step‑by‑Step Guide

Background

In the previous article we deployed Llama 2‑chat‑7B on the cloud. This article uses the “LangChain + Llama 2” architecture to build a customized soul‑healing chatbot. Readers with relevant background can jump to the “Practical” part.

Fine‑tuning vs Knowledge Base

Large models still struggle with vertical domain Q&A, so injecting domain knowledge is a direct solution. Two approaches: domain fine‑tuning and external knowledge base.

Domain fine‑tuning trains the base model on a small set of task‑specific data, adjusting model parameters. It works when the task is well defined and sufficient labeled data exists. Common methods include Freeze, P‑tuning, and LoRA. Drawbacks: high cost of data, compute, and maintenance; risk of degrading performance on other tasks.

External knowledge base keeps the base model unchanged and uses prompt engineering to provide relevant documents as context. Advantages: higher answer precision, stronger adaptability by updating source documents. Limitations include context window size and prompt design.

To build a domain‑specific QA system we leverage LangChain’s knowledge‑base integration.

LangChain Modules

LangChain is a framework for developing LLM‑driven applications. Its two main capabilities are data‑aware (connecting various data sources) and agentic (allowing interaction with the environment). Core modules include Models, Prompts, Chains, Indexes, Agents, each offering standardized extensible interfaces.

LangChain can wrap many LLMs (OpenAI, Cohere, HuggingFace) and integrates vector stores such as Milvus, Pinecone, Chroma for semantic search. It supports unstructured file types like text, PPT, images, HTML, PDF.

Typical pipeline: Load documents → Split text → Retrieve relevant chunks → Build prompt → LLM generates answer.

Practical Steps

2.1 Environment Setup

a. Install LangChain: pip install langchain b. Deploy Llama 2 (see previous article).

c. Download embedding model text2vec‑large‑chinese from HuggingFace (alternatives: m3e, bge).

d. Download dataset: “Soul‑warming soup” from HuggingFace (631 short texts).

2.2 Document Parsing

a. Load dataset using LangChain loaders, e.g.:

from langchain.document_loaders import UnstructuredFileLoader
loader = UnstructuredFileLoader("path/to/dataset")
docs = loader.load()

b. Split text with:

from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=20)
docs = text_splitter.split_documents(docs)

c. Embed texts with HuggingFaceEmbeddings and store in FAISS vector store, saving/loading locally:

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
import os
embeddings = HuggingFaceEmbeddings(model_name="{your_path}/text2vec-large-chinese", model_kwargs={'device': 'cuda'})
if not os.path.exists("{your_path}/my_faiss_store.faiss"):
    vector_store = FAISS.from_documents(docs, embeddings)
    vector_store.save_local("{your_path}/my_faiss_store.faiss")
else:
    vector_store = FAISS.load_local("{your_path}/my_faiss_store.faiss", embeddings=embeddings)

2.3 Model Loading

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained('/opt/Llama-2-7b-chat-hf', trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained('/opt/Llama-2-7b-chat-hf', torch_dtype=torch.float16, device_map='auto', trust_remote_code=True)
llm = base_model.eval()

2.4 Semantic Retrieval

Perform similarity search on FAISS:

query = "面对求职屡屡碰壁的大学生，请说一句话来鼓励他？"
docs = vector_store.similarity_search(query)
context = [doc.page_content for doc in docs]
print(context)

Set prompt template (example):

#qa_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context: {context}
Question: {question}
Only return the helpful answer below and nothing else.
Helpful answer: """

Combine context and query:

context = "
".join(context)
prompt = f"基于以上内容：
{context} 
 请回答：{query} 
 字数限制在30字以内"

2.5 Inference Example

Configure generation parameters and run:

inputs = tokenizer([f"Human:{prompt}
Assistant:"], return_tensors="pt")
input_ids = inputs["input_ids"].to('cuda')
param_config = {
    "input_ids": input_ids,
    "max_new_tokens": 1024,
    "do_sample": True,
    "top_k": 5,
    "top_p": 0.95,
    "temperature": 0.1,
    "repetition_penalty": 1.3
}
result = llm.generate(**param_config)
answer = tokenizer.decode(result[0], skip_special_tokens=True)
print(answer)
# Q: 面对求职屡屡碰壁的大学生，请说一句话来鼓励他？
# A: 坚持不懈，机会终将降临

Knowledge Base Issues and Optimizations

3.1 Limitations of LLM+Embedding Search

Embedding‑based retrieval may miss relevant knowledge when multiple facts need to be combined, leading to lower precision. Simple fixes like lowering similarity thresholds or increasing top_k add noise and increase token cost.

3.2 Optimization Directions

Improve intent recognition and recall via keyword extraction and slot filling; build multi‑level indexes; convert the knowledge base to a knowledge graph; employ multi‑path retrieval that combines semantic search with traditional Elasticsearch and weighted voting.

Deployed examples such as the “录问” legal LLM (Baichuan‑7B) illustrate these ideas.

Additional optimization areas include refined knowledge partitions, text‑splitting strategies, prompt quality, and model selection.

The article concludes that using LangChain + LLM we quickly built a knowledge‑enhanced soul‑healing chatbot and discussed potential improvements; the next article will explore mainstream fine‑tuning techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Prompt engineering LangChain fine-tuning knowledge base Vector Store Llama-2

Written by

UCloud Tech

UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.