Building a Retrieval‑Augmented Generation (RAG) System with JD Cloud Docs, ClickHouse, LangChain, and FastAPI
This guide explains how to build a Retrieval‑Augmented Generation (RAG) system using JD Cloud documentation as a knowledge base, storing document embeddings in ClickHouse, leveraging LangChain for vector retrieval, and exposing query and answer services via FastAPI and a Gradio UI.
RAG (Retrieval‑Augmented Generation) combines retrieval and generation models to enhance tasks such as text generation and question answering.
The implementation follows four main steps: data collection, knowledge‑base construction, vector retrieval, and prompt‑model integration.
Data collection is the most labor‑intensive part; it involves gathering, cleaning, formatting, and splitting documents. In this project the official JD Cloud documentation is used as the source, with each document represented as a JSON object containing content , title , product , and url fields.
{
"content": "DDoS IP高防结合Web应用防火墙方案说明\n...",
"title": "DDoS IP高防结合Web应用防火墙方案说明",
"product": "DDoS IP高防",
"url": "https://docs.jdcloud.com/cn/anti-ddos-pro/anti-ddos-pro-and-waf"
}Vector database selection : ClickHouse is chosen as the vector store because its LangChain integration is smooth, it supports SQL‑based vector queries, and JD Cloud provides a managed product with professional support.
ck integration in LangChain is mature, making ingestion seamless.
SQL‑based vector search lowers the learning curve.
JD Cloud offers dedicated support for the product.
Document vectorization and ingestion uses LangChain’s DirectoryLoader together with a custom loader JD_DOC_Loader to read the JSON files and produce Document objects.
from libs.jd_doc_json_loader import JD_DOC_Loader
from langchain_community.document_loaders import DirectoryLoader
root_dir = "/root/jd_docs"
loader = DirectoryLoader(
'/root/jd_docs',
glob="**/*.json",
loader_cls=JD_DOC_Loader
)
docs = loader.load()The custom loader parses each JSON file, extracts the fields, and yields a Document with page_content and metadata.
import json
import logging
from pathlib import Path
from typing import Iterator, Optional, Union
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseLoader
logger = logging.getLogger(__name__)
class JD_DOC_Loader(BaseLoader):
"""Load a JSON document containing content, title, product and url."""
def __init__(self, file_path: Union[str, Path], encoding: Optional[str] = None, autodetect_encoding: bool = False):
self.file_path = file_path
self.encoding = encoding
self.autodetect_encoding = autodetect_encoding
def lazy_load(self) -> Iterator[Document]:
try:
with open(self.file_path, encoding=self.encoding) as f:
doc_data = json.load(f)
text = doc_data["content"]
title = doc_data["title"]
product = doc_data["product"]
from_url = doc_data["url"]
except Exception as e:
raise RuntimeError(f"Error loading {self.file_path}") from e
metadata = {"source": from_url, "title": title, "product": product}
yield Document(page_content=text, metadata=metadata)Embedding generation uses a HuggingFace model, and the documents are stored in ClickHouse via LangChain’s Clickhouse vector store.
import langchain_community.vectorstores.clickhouse as clickhouse
from langchain.embeddings import HuggingFaceEmbeddings
model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name="/root/models/moka-ai-m3e-large", model_kwargs=model_kwargs)
settings = clickhouse.ClickhouseSettings(table="jd_docs_m3e_with_url", username="default", password="xxxxxx", host="10.0.1.94")
docsearch = clickhouse.Clickhouse.from_documents(docs, embeddings, config=settings)After ingestion, a quick verification retrieves relevant documents using a similarity‑score threshold.
import langchain_community.vectorstores.clickhouse as clickhouse
from langchain.embeddings import HuggingFaceEmbeddings
model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name="/root/models/moka-ai-m3e-large", model_kwargs=model_kwargs)
settings = clickhouse.ClickhouseSettings(table="jd_docs_m3e_with_url_splited", username="default", password="xxxx", host="10.0.1.94")
ck_db = clickhouse.Clickhouse(embeddings, config=settings)
ck_retriever = ck_db.as_retriever(search_type="similarity_score_threshold", search_kwargs={'score_threshold': 0.9})
ck_retriever.get_relevant_documents("如何创建mysql rds")A FastAPI service exposes two endpoints: /retriever for vector search and /answer for generating answers with a LLM.
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn, json, logging
from langchain_community.embeddings import HuggingFaceEmbeddings
import langchain_community.vectorstores.clickhouse as clickhouse
from langchain.prompts import PromptTemplate
from langchain_community.llms import VLLM
from transformers import AutoTokenizer
app = FastAPI(docs_url=None)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
model_name = "/root/models/Qwen1.5-1.8B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = VLLM(model=model_name, tokenizer=tokenizer, task="text-generation", temperature=0.2, do_sample=True, repetition_penalty=1.1, return_full_text=False, max_new_tokens=900)
prompt_template = """
你是一个云技术专家
使用以下检索到的Context回答问题。
如果不知道答案,就说不知道。
用中文回答问题。
Question: {question}
Context: {context}
Answer:
"""
prompt = PromptTemplate(input_variables=["context", "question"], template=prompt_template)
class Question(BaseModel):
content: str
@app.post("/retriever")
async def retriever(question: Question):
global ck_retriever
return ck_retriever.invoke(question.content)
@app.post("/answer")
async def answer(q: Question):
logger.info("invoke!!!")
context_list_str = get_context_list(q.content)
context_list = json.loads(context_list_str)
context = "".join(item["page_content"] for item in context_list)
source_list = [item["metadata"]["source"] for item in context_list]
p = prompt.format(context=context, question=q.content)
answer = llm(p)
return {"answer": answer, "sources": source_list}
if __name__ == "__main__":
uvicorn.run(app="retriever_api:app", host="0.0.0.0", port=8888, reload=True)Finally, a Gradio interface provides a simple web UI for asking questions.
import json, gradio as gr, requests
def answer(question):
url = "http://127.0.0.1:8888/answer"
payload = {"content": question}
res = requests.post(url, json=payload)
res_json = json.loads(res.text)
return [res_json["answer"], res_json["sources"]]
demo = gr.Interface(fn=answer, inputs=gr.Textbox(label="question", lines=5), outputs=[gr.Markdown(label="answer"), gr.JSON(label="urls", value=list)])
demo.launch(server_name="0.0.0.0")The complete pipeline demonstrates how to collect domain‑specific documents, embed them, store them in a ClickHouse vector store, retrieve relevant passages, and generate answers with a lightweight LLM, all wrapped in RESTful and interactive front‑ends.
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.