Artificial Intelligence 14 min read

Build a Retrieval‑Augmented Generation (RAG) System Using JD Cloud Docs and ClickHouse

This guide walks through creating a Retrieval‑Augmented Generation pipeline that harvests JD Cloud documentation, stores vector embeddings in ClickHouse, and serves queries via FastAPI, LangChain, a Qwen LLM, and a Gradio front‑end.

JD Cloud Developers

Jun 14, 2024

Build a Retrieval‑Augmented Generation (RAG) System Using JD Cloud Docs and ClickHouse

RAG (Retrieval‑Augmented Generation) combines retrieval and generation for natural‑language processing tasks such as text generation and question answering.

Data collection

Knowledge‑base construction

Vector retrieval

Prompt and model integration

Data Collection

Data collection is the most labor‑intensive step; we use JD Cloud official documentation as the knowledge base. Each document is stored as a JSON object with four fields: content, title, product, and url.

{
    "content": "DDoS IP高防结合Web应用防火墙方案说明
...",
    "title": "DDoS IP高防结合Web应用防火墙方案说明",
    "product": "DDoS IP高防",
    "url": "https://docs.jdcloud.com/cn/anti-ddos-pro/anti-ddos-pro-and-waf"
}

Choosing a Vector Database and Implementing the Retriever

We selected ClickHouse as the vector store because its LangChain integration is mature, it supports SQL‑based vector queries, and JD Cloud provides dedicated support.

Document Vectorization and Ingestion

We use LangChain's Retriever tools to embed documents. First, a custom loader parses the JSON files.

from libs.jd_doc_json_loader import JD_DOC_Loader
from langchain_community.document_loaders import DirectoryLoader
root_dir = "/root/jd_docs"
loader = DirectoryLoader(root_dir, glob="**/*.json", loader_cls=JD_DOC_Loader)
docs = loader.load()

The loader implementation:

import json, logging
from pathlib import Path
from typing import Iterator, Optional, Union
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseLoader
from langchain_community.document_loaders.helpers import detect_file_encodings

logger = logging.getLogger(__name__)

class JD_DOC_Loader(BaseLoader):
    """Load a JSON file containing content, title, product, and url."""
    def __init__(self, file_path: Union[str, Path], encoding: Optional[str] = None, autodetect_encoding: bool = False):
        self.file_path = file_path
        self.encoding = encoding
        self.autodetect_encoding = autodetect_encoding
    def lazy_load(self) -> Iterator[Document]:
        try:
            with open(self.file_path, encoding=self.encoding) as f:
                doc_data = json.load(f)
                text = doc_data["content"]
                title = doc_data["title"]
                product = doc_data["product"]
                from_url = doc_data["url"]
        except UnicodeDecodeError as e:
            if self.autodetect_encoding:
                for enc in detect_file_encodings(self.file_path):
                    try:
                        with open(self.file_path, encoding=enc.encoding) as f:
                            text = f.read()
                        break
                    except UnicodeDecodeError:
                        continue
            else:
                raise RuntimeError(f"Error loading {self.file_path}") from e
        except Exception as e:
            raise RuntimeError(f"Error loading {self.file_path}") from e
        metadata = {"source": from_url, "title": title, "product": product}
        yield Document(page_content=text, metadata=metadata)

Embedding and ClickHouse vector store creation:

import langchain_community.vectorstores.clickhouse as clickhouse
from langchain.embeddings import HuggingFaceEmbeddings

model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name="/root/models/moka-ai-m3e-large", model_kwargs=model_kwargs)
settings = clickhouse.ClickhouseSettings(table="jd_docs_m3e_with_url", username="default", password="xxxxxx", host="10.0.1.94")

docsearch = clickhouse.Clickhouse.from_documents(docs, embeddings, config=settings)

After ingestion we verify retrieval:

settings = clickhouse.ClickhouseSettings(table="jd_docs_m3e_with_url_splited", username="default", password="xxxx", host="10.0.1.94")
ck_db = clickhouse.Clickhouse(embeddings, config=settings)
ck_retriever = ck_db.as_retriever(search_type="similarity_score_threshold", search_kwargs={'score_threshold': 0.9})
ck_retriever.get_relevant_documents("如何创建mysql rds")

Building a RESTful Service with FastAPI

A simple FastAPI service exposes the retriever endpoint.

from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn, json, logging
from langchain_community.vectorstores import clickhouse
from langchain.embeddings import HuggingFaceEmbeddings

app = FastAPI(docs_url=None)
app.host = "0.0.0.0"

model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name="/root/models/moka-ai-m3e-large", model_kwargs=model_kwargs)
settings = clickhouse.ClickhouseSettings(table="jd_docs_m3e_with_url_splited", username="default", password="xxxx", host="10.0.1.94")
ck_db = clickhouse.Clickhouse(embeddings, config=settings)
ck_retriever = ck_db.as_retriever(search_type="similarity", search_kwargs={"k": 3})

class Question(BaseModel):
    content: str

@app.post("/retriever")
async def retriever(q: Question):
    return ck_retriever.invoke(q.content)

if __name__ == "__main__":
    uvicorn.run(app="retriever_api:app", host="0.0.0.0", port=8000, reload=True)

Combining Model and Prompt to Answer Questions

We use the Qwen‑1.8B model (via VLLM) together with a Chinese prompt that asks the model to answer using the retrieved context.

from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn, json, logging, requests
from langchain_community.llms import VLLM
from transformers import AutoTokenizer
from langchain.prompts import PromptTemplate

app = FastAPI(docs_url=None)
app.host = "0.0.0.0"
logger = logging.getLogger()
logger.setLevel(logging.INFO)

model_name = "/root/models/Qwen1.5-1.8B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = VLLM(model=model_name, tokenizer=tokenizer, task="text-generation", temperature=0.2, do_sample=True, repetition_penalty=1.1, return_full_text=False, max_new_tokens=900)

prompt_template = """
你是一个云技术专家
使用以下检索到的Context回答问题。
如果不知道答案，就说不知道。
用中文回答问题。
Question: {question}
Context: {context}
Answer: 
"""
prompt = PromptTemplate(input_variables=["context", "question"], template=prompt_template)

def get_context_list(q: str):
    res = requests.post("http://10.0.0.7:8000/retriever", json={"content": q})
    return res.text

class Question(BaseModel):
    content: str

@app.post("/answer")
async def answer(q: Question):
    context_str = get_context_list(q.content)
    context_list = json.loads(context_str)
    context = "".join(item["page_content"] for item in context_list)
    sources = [item["metadata"]["source"] for item in context_list]
    p = prompt.format(context=context, question=q.content)
    ans = llm(p)
    return {"answer": ans, "sources": sources}

if __name__ == "__main__":
    uvicorn.run(app="retriever_api:app", host="0.0.0.0", port=8888, reload=True)

The answer service concatenates retrieved documents as context, feeds them to the LLM, and returns both the generated answer and the source URLs.

Interactive Front‑End with Gradio

A lightweight Gradio interface lets users ask questions and see answers with source links.

import json, gradio as gr, requests

def answer(question):
    res = requests.post("http://127.0.0.1:8888/answer", json={"content": question})
    data = json.loads(res.text)
    return [data["answer"], data["sources"]]

demo = gr.Interface(fn=answer, inputs=gr.Textbox(label="question", lines=5), outputs=[gr.Markdown(label="answer"), gr.JSON(label="urls")])

demo.launch(server_name="0.0.0.0")

With the pipeline deployed, users can query JD Cloud documentation through a conversational interface powered by retrieval‑augmented generation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM LangChain RAG vector database ClickHouse FastAPI

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.