Build a Retrieval‑Augmented Generation (RAG) System Using JD Cloud Docs and ClickHouse
This guide walks through creating a Retrieval‑Augmented Generation pipeline that harvests JD Cloud documentation, stores vector embeddings in ClickHouse, and serves queries via FastAPI, LangChain, a Qwen LLM, and a Gradio front‑end.
RAG (Retrieval‑Augmented Generation) combines retrieval and generation for natural‑language processing tasks such as text generation and question answering.
Data collection
Knowledge‑base construction
Vector retrieval
Prompt and model integration
Data Collection
Data collection is the most labor‑intensive step; we use JD Cloud official documentation as the knowledge base. Each document is stored as a JSON object with four fields: content, title, product, and url.
{
"content": "DDoS IP高防结合Web应用防火墙方案说明
...",
"title": "DDoS IP高防结合Web应用防火墙方案说明",
"product": "DDoS IP高防",
"url": "https://docs.jdcloud.com/cn/anti-ddos-pro/anti-ddos-pro-and-waf"
}Choosing a Vector Database and Implementing the Retriever
We selected ClickHouse as the vector store because its LangChain integration is mature, it supports SQL‑based vector queries, and JD Cloud provides dedicated support.
Document Vectorization and Ingestion
We use LangChain's Retriever tools to embed documents. First, a custom loader parses the JSON files.
from libs.jd_doc_json_loader import JD_DOC_Loader
from langchain_community.document_loaders import DirectoryLoader
root_dir = "/root/jd_docs"
loader = DirectoryLoader(root_dir, glob="**/*.json", loader_cls=JD_DOC_Loader)
docs = loader.load()The loader implementation:
import json, logging
from pathlib import Path
from typing import Iterator, Optional, Union
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseLoader
from langchain_community.document_loaders.helpers import detect_file_encodings
logger = logging.getLogger(__name__)
class JD_DOC_Loader(BaseLoader):
"""Load a JSON file containing content, title, product, and url."""
def __init__(self, file_path: Union[str, Path], encoding: Optional[str] = None, autodetect_encoding: bool = False):
self.file_path = file_path
self.encoding = encoding
self.autodetect_encoding = autodetect_encoding
def lazy_load(self) -> Iterator[Document]:
try:
with open(self.file_path, encoding=self.encoding) as f:
doc_data = json.load(f)
text = doc_data["content"]
title = doc_data["title"]
product = doc_data["product"]
from_url = doc_data["url"]
except UnicodeDecodeError as e:
if self.autodetect_encoding:
for enc in detect_file_encodings(self.file_path):
try:
with open(self.file_path, encoding=enc.encoding) as f:
text = f.read()
break
except UnicodeDecodeError:
continue
else:
raise RuntimeError(f"Error loading {self.file_path}") from e
except Exception as e:
raise RuntimeError(f"Error loading {self.file_path}") from e
metadata = {"source": from_url, "title": title, "product": product}
yield Document(page_content=text, metadata=metadata)Embedding and ClickHouse vector store creation:
import langchain_community.vectorstores.clickhouse as clickhouse
from langchain.embeddings import HuggingFaceEmbeddings
model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name="/root/models/moka-ai-m3e-large", model_kwargs=model_kwargs)
settings = clickhouse.ClickhouseSettings(table="jd_docs_m3e_with_url", username="default", password="xxxxxx", host="10.0.1.94")
docsearch = clickhouse.Clickhouse.from_documents(docs, embeddings, config=settings)After ingestion we verify retrieval:
settings = clickhouse.ClickhouseSettings(table="jd_docs_m3e_with_url_splited", username="default", password="xxxx", host="10.0.1.94")
ck_db = clickhouse.Clickhouse(embeddings, config=settings)
ck_retriever = ck_db.as_retriever(search_type="similarity_score_threshold", search_kwargs={'score_threshold': 0.9})
ck_retriever.get_relevant_documents("如何创建mysql rds")Building a RESTful Service with FastAPI
A simple FastAPI service exposes the retriever endpoint.
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn, json, logging
from langchain_community.vectorstores import clickhouse
from langchain.embeddings import HuggingFaceEmbeddings
app = FastAPI(docs_url=None)
app.host = "0.0.0.0"
model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name="/root/models/moka-ai-m3e-large", model_kwargs=model_kwargs)
settings = clickhouse.ClickhouseSettings(table="jd_docs_m3e_with_url_splited", username="default", password="xxxx", host="10.0.1.94")
ck_db = clickhouse.Clickhouse(embeddings, config=settings)
ck_retriever = ck_db.as_retriever(search_type="similarity", search_kwargs={"k": 3})
class Question(BaseModel):
content: str
@app.post("/retriever")
async def retriever(q: Question):
return ck_retriever.invoke(q.content)
if __name__ == "__main__":
uvicorn.run(app="retriever_api:app", host="0.0.0.0", port=8000, reload=True)Combining Model and Prompt to Answer Questions
We use the Qwen‑1.8B model (via VLLM) together with a Chinese prompt that asks the model to answer using the retrieved context.
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn, json, logging, requests
from langchain_community.llms import VLLM
from transformers import AutoTokenizer
from langchain.prompts import PromptTemplate
app = FastAPI(docs_url=None)
app.host = "0.0.0.0"
logger = logging.getLogger()
logger.setLevel(logging.INFO)
model_name = "/root/models/Qwen1.5-1.8B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = VLLM(model=model_name, tokenizer=tokenizer, task="text-generation", temperature=0.2, do_sample=True, repetition_penalty=1.1, return_full_text=False, max_new_tokens=900)
prompt_template = """
你是一个云技术专家
使用以下检索到的Context回答问题。
如果不知道答案,就说不知道。
用中文回答问题。
Question: {question}
Context: {context}
Answer:
"""
prompt = PromptTemplate(input_variables=["context", "question"], template=prompt_template)
def get_context_list(q: str):
res = requests.post("http://10.0.0.7:8000/retriever", json={"content": q})
return res.text
class Question(BaseModel):
content: str
@app.post("/answer")
async def answer(q: Question):
context_str = get_context_list(q.content)
context_list = json.loads(context_str)
context = "".join(item["page_content"] for item in context_list)
sources = [item["metadata"]["source"] for item in context_list]
p = prompt.format(context=context, question=q.content)
ans = llm(p)
return {"answer": ans, "sources": sources}
if __name__ == "__main__":
uvicorn.run(app="retriever_api:app", host="0.0.0.0", port=8888, reload=True)The answer service concatenates retrieved documents as context, feeds them to the LLM, and returns both the generated answer and the source URLs.
Interactive Front‑End with Gradio
A lightweight Gradio interface lets users ask questions and see answers with source links.
import json, gradio as gr, requests
def answer(question):
res = requests.post("http://127.0.0.1:8888/answer", json={"content": question})
data = json.loads(res.text)
return [data["answer"], data["sources"]]
demo = gr.Interface(fn=answer, inputs=gr.Textbox(label="question", lines=5), outputs=[gr.Markdown(label="answer"), gr.JSON(label="urls")])
demo.launch(server_name="0.0.0.0")With the pipeline deployed, users can query JD Cloud documentation through a conversational interface powered by retrieval‑augmented generation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
