Artificial Intelligence 13 min read

Building a Retrieval‑Augmented Generation (RAG) System with JD Cloud Docs, ClickHouse, LangChain, and FastAPI

This guide explains how to build a Retrieval‑Augmented Generation (RAG) system using JD Cloud documentation as a knowledge base, storing document embeddings in ClickHouse, leveraging LangChain for vector retrieval, and exposing query and answer services via FastAPI and a Gradio UI.

JD Tech Talk

Jun 14, 2024

Building a Retrieval‑Augmented Generation (RAG) System with JD Cloud Docs, ClickHouse, LangChain, and FastAPI

RAG (Retrieval‑Augmented Generation) combines retrieval and generation models to enhance tasks such as text generation and question answering.

The implementation follows four main steps: data collection, knowledge‑base construction, vector retrieval, and prompt‑model integration.

Data collection is the most labor‑intensive part; it involves gathering, cleaning, formatting, and splitting documents. In this project the official JD Cloud documentation is used as the source, with each document represented as a JSON object containing content, title, product, and url fields.

{
    "content": "DDoS IP高防结合Web应用防火墙方案说明
...",
    "title": "DDoS IP高防结合Web应用防火墙方案说明",
    "product": "DDoS IP高防",
    "url": "https://docs.jdcloud.com/cn/anti-ddos-pro/anti-ddos-pro-and-waf"
}

Vector database selection : ClickHouse is chosen as the vector store because its LangChain integration is smooth, it supports SQL‑based vector queries, and JD Cloud provides a managed product with professional support.

ck integration in LangChain is mature, making ingestion seamless.

SQL‑based vector search lowers the learning curve.

JD Cloud offers dedicated support for the product.

Document vectorization and ingestion uses LangChain’s DirectoryLoader together with a custom loader JD_DOC_Loader to read the JSON files and produce Document objects.

from libs.jd_doc_json_loader import JD_DOC_Loader
from langchain_community.document_loaders import DirectoryLoader
root_dir = "/root/jd_docs"
loader = DirectoryLoader(
    '/root/jd_docs',
    glob="**/*.json",
    loader_cls=JD_DOC_Loader
)
docs = loader.load()

The custom loader parses each JSON file, extracts the fields, and yields a Document with page_content and metadata.

import json
import logging
from pathlib import Path
from typing import Iterator, Optional, Union
from langchain_core.documents import Document
from langchain_community.document_loaders.base import BaseLoader

logger = logging.getLogger(__name__)

class JD_DOC_Loader(BaseLoader):
    """Load a JSON document containing content, title, product and url."""
    def __init__(self, file_path: Union[str, Path], encoding: Optional[str] = None, autodetect_encoding: bool = False):
        self.file_path = file_path
        self.encoding = encoding
        self.autodetect_encoding = autodetect_encoding

    def lazy_load(self) -> Iterator[Document]:
        try:
            with open(self.file_path, encoding=self.encoding) as f:
                doc_data = json.load(f)
                text = doc_data["content"]
                title = doc_data["title"]
                product = doc_data["product"]
                from_url = doc_data["url"]
        except Exception as e:
            raise RuntimeError(f"Error loading {self.file_path}") from e
        metadata = {"source": from_url, "title": title, "product": product}
        yield Document(page_content=text, metadata=metadata)

Embedding generation uses a HuggingFace model, and the documents are stored in ClickHouse via LangChain’s Clickhouse vector store.

import langchain_community.vectorstores.clickhouse as clickhouse
from langchain.embeddings import HuggingFaceEmbeddings

model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name="/root/models/moka-ai-m3e-large", model_kwargs=model_kwargs)
settings = clickhouse.ClickhouseSettings(table="jd_docs_m3e_with_url", username="default", password="xxxxxx", host="10.0.1.94")

docsearch = clickhouse.Clickhouse.from_documents(docs, embeddings, config=settings)

After ingestion, a quick verification retrieves relevant documents using a similarity‑score threshold.

import langchain_community.vectorstores.clickhouse as clickhouse
from langchain.embeddings import HuggingFaceEmbeddings

model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name="/root/models/moka-ai-m3e-large", model_kwargs=model_kwargs)
settings = clickhouse.ClickhouseSettings(table="jd_docs_m3e_with_url_splited", username="default", password="xxxx", host="10.0.1.94")
ck_db = clickhouse.Clickhouse(embeddings, config=settings)
ck_retriever = ck_db.as_retriever(search_type="similarity_score_threshold", search_kwargs={'score_threshold': 0.9})
ck_retriever.get_relevant_documents("如何创建mysql rds")

A FastAPI service exposes two endpoints: /retriever for vector search and /answer for generating answers with a LLM.

from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn, json, logging
from langchain_community.embeddings import HuggingFaceEmbeddings
import langchain_community.vectorstores.clickhouse as clickhouse
from langchain.prompts import PromptTemplate
from langchain_community.llms import VLLM
from transformers import AutoTokenizer

app = FastAPI(docs_url=None)
logger = logging.getLogger()
logger.setLevel(logging.INFO)

model_name = "/root/models/Qwen1.5-1.8B-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = VLLM(model=model_name, tokenizer=tokenizer, task="text-generation", temperature=0.2, do_sample=True, repetition_penalty=1.1, return_full_text=False, max_new_tokens=900)

prompt_template = """
你是一个云技术专家
使用以下检索到的Context回答问题。
如果不知道答案，就说不知道。
用中文回答问题。
Question: {question}
Context: {context}
Answer: 
"""
prompt = PromptTemplate(input_variables=["context", "question"], template=prompt_template)

class Question(BaseModel):
    content: str

@app.post("/retriever")
async def retriever(question: Question):
    global ck_retriever
    return ck_retriever.invoke(question.content)

@app.post("/answer")
async def answer(q: Question):
    logger.info("invoke!!!")
    context_list_str = get_context_list(q.content)
    context_list = json.loads(context_list_str)
    context = "".join(item["page_content"] for item in context_list)
    source_list = [item["metadata"]["source"] for item in context_list]
    p = prompt.format(context=context, question=q.content)
    answer = llm(p)
    return {"answer": answer, "sources": source_list}

if __name__ == "__main__":
    uvicorn.run(app="retriever_api:app", host="0.0.0.0", port=8888, reload=True)

Finally, a Gradio interface provides a simple web UI for asking questions.

import json, gradio as gr, requests

def answer(question):
    url = "http://127.0.0.1:8888/answer"
    payload = {"content": question}
    res = requests.post(url, json=payload)
    res_json = json.loads(res.text)
    return [res_json["answer"], res_json["sources"]]

demo = gr.Interface(fn=answer, inputs=gr.Textbox(label="question", lines=5), outputs=[gr.Markdown(label="answer"), gr.JSON(label="urls", value=list)])

demo.launch(server_name="0.0.0.0")

The complete pipeline demonstrates how to collect domain‑specific documents, embed them, store them in a ClickHouse vector store, retrieve relevant passages, and generate answers with a lightweight LLM, all wrapped in RESTful and interactive front‑ends.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LangChain RAG vector database clickhouse FastAPI

Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.