Artificial Intelligence 15 min read

Building a Custom LLM Chatbot with LangChain, ChromaDB, and LLaMA‑2

This tutorial explains how to leverage generative AI tools—including LLMs, embedding models, vector databases, and the LangChain framework—to create a custom chatbot that answers user queries using a knowledge base, with step‑by‑step code examples for Google Colab.

Architect

Aug 31, 2023

Building a Custom LLM Chatbot with LangChain, ChromaDB, and LLaMA‑2

Since the release of ChatGPT, generative AI has rapidly advanced, offering many open‑source models and tools; this article demonstrates how to use these resources to build a custom chatbot powered by a large language model (LLM).

Generative AI differs from predictive AI by creating new content such as text, images, or audio. The focus here is on text generation driven by LLMs such as OpenAI GPT‑4, Meta LLaMA‑2, Google PaLM, and Anthropic Claude 2.

LLMs are deep‑learning models trained on massive text corpora; they can be adapted to specific tasks via fine‑tuning or, more simply, through context injection (prompt engineering) without modifying the model weights.

Context injection typically follows these steps: collect structured or unstructured data, load and split the data into text chunks, embed the chunks into vectors, store the vectors in a vector database (e.g., ChromaDB), retrieve the most similar chunks for a user query, and combine the retrieved context with a prompt template before sending it to the LLM.

Embedding models convert tokens into high‑dimensional vectors; the article uses OpenAI’s text‑embedding‑ada‑002 (1536‑dimensional) and stores the vectors in ChromaDB.

LangChain is a framework that orchestrates LLMs, document loaders, splitters, embeddings, vector stores, and prompt templates. The tutorial uses LangChain’s CSVLoader, CharacterTextSplitter, OpenAIEmbeddings, Chroma, PromptTemplate, and RetrievalQA components.

In Google Colab the required packages are installed, the necessary libraries are imported, and a HuggingFace LLaMA‑2‑7B‑chat‑hf model is loaded via a Transformers pipeline. The pipeline is wrapped with HuggingFacePipeline to create an LLM object.

A prompt template is defined to make the LLM act as a customer‑service chatbot for an online perfume company, and the template is passed to a RetrievalQA chain that connects the LLM, the Chroma vector store, and the prompt.

Finally, a sample query (“What types of perfumes do you sell?”) is run through the chain, and the response demonstrates how the system can provide concise, human‑like answers based on the custom knowledge base.

The tutorial concludes that even users with basic programming experience can build functional LLM applications by following these steps.

!pip install -q transformers einops accelerate langchain bitsandbytes
!pip install -qqq openai
!pip install -Uqqq chromadb

import os
import textwrap

import langchain
import chromadb
import transformers
import openai
import torch

from transformers import AutoTokenizer
from langchain import HuggingFacePipeline
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

!huggingface-cli login

os.environ["OPENAI_API_KEY"] = "INSERT_YOUR_API_KEY"

# Set up HuggingFace Pipeline with Llama-2-7b-chat-hf model
model = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
      "text-generation",  # task
      model=model,
      tokenizer=tokenizer,
      torch_dtype=torch.bfloat16,
      trust_remote_code=True,
      device_map="auto",
      max_length=1000,
      do_sample=True,
      top_k=10,
      num_return_sequences=1,
      eos_token_id=tokenizer.eos_token_id
)
# LLM initialized in HuggingFace Pipeline wrapper
llm = HuggingFacePipeline(pipeline=pipeline, model_kwargs={'temperature':0})

# Load documents locally as CSV
loader = CSVLoader('YOUR_CSV_FILE_PATH')
docs = loader.load()
docs[0]
# Output:
# Document(page_content='...Question: ...', metadata={'source': '/content/sample_data/Fragrances-Dataset.csv', 'row': 0})

# Split document into text chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(docs)

# Initialize the open-source embedding function, default: text-embedding-ada-002
embedding_function = OpenAIEmbeddings()

# Load it into ChromaDB
db = Chroma.from_documents(docs, embedding_function)

# Design Prompt Template
template = """
You are a customer service chatbot for an online perfume company called Fragrances International.

{context}

Answer the customer's questions only using the source data provided. If you are unsure, say "I don't know, please call our customer support". Use engaging, courteous, and professional language similar to a customer representative.
Keep your answers concise.

Question:

Answer: """

# Initialize prompt using PromptTemplate via LangChain
prompt = PromptTemplate(template=template, input_variables=["context"])
print(prompt.format(context="A customer is on the perfume company website and wants to chat with the website chatbot."))

# Chain to have all components together and query the LLM
chain_type_kwargs = {"prompt": prompt}

chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 1}),
    chain_type_kwargs=chain_type_kwargs,
)

# Formatted printing
def print_response(response: str):
    print("
".join(textwrap.wrap(response, width=80)))

# Running chain through LLM with query
query = "What types of perfumes do you sell?"
response = chain.run(query)
print_response(response)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python LLM LangChain Vector Database Embedding Chatbot Generative AI

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.