Build a Custom AI Q&A System Using Volcano Engine Cloud Search & LangChain

This guide explains how to create a domain‑specific intelligent Q&A system by leveraging prompt‑tuning, Volcano Engine Cloud Search vector store, and LangChain, covering model selection, data embedding, vector indexing, retrieval, and LLM integration with full code examples.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
Build a Custom AI Q&A System Using Volcano Engine Cloud Search & LangChain

Introduction

With the rise of large language models (LLMs), generative AI is valuable for tasks such as image generation, document writing, and information retrieval. To apply LLMs in vertical domains, knowledge bases must be incorporated for training or inference.

Two common approaches are fine‑tuning (high cost, low timeliness) and prompt‑tuning (flexible and low cost). This article uses prompt‑tuning to build a custom intelligent Q&A system with Volcano Engine Cloud Search and the Ark platform.

Setup

1. Log in to Volcano Engine Cloud Search, create an instance cluster, and select version 7.10.

2. Choose an appropriate model from the Ark platform model marketplace and review its API documentation.

Mapping Preparation

PUT langchain_faq
{
  "mappings": {
    "properties": {
      "message": { "type": "text" },
      "message_embedding": { "type": "knn_vector", "dimension": 768 },
      "metadata": { "type": "text" }
    }
  },
  "settings": {
    "index": {
      "refresh_interval": "10s",
      "number_of_shards": "3",
      "knn": true,
      "knn.space_type": "cosinesimil",
      "number_of_replicas": "1"
    }
  }
}

Client Preparation

Install dependencies:

pip install volcengine --user
pip install langchain --user

Initialize components:

# Embedding
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()
# VectorStore
from langchain.vectorstores import OpenSearchVectorSearch
# LLM Base
from langchain.llms.base import LLM
# Document loader
from langchain.document_loaders import WebBaseLoader
# LLM Cache
from langchain.cache import InMemoryCache
llm_cache = InMemoryCache()

MaaS Preparation

maas_host = "maas-api.ml-platform-cn-beijing.volces.com"
api_chat = "chat"
API_INFOS = {api_chat: ApiInfo("POST", "/api/v1/" + api_chat, {}, {}, {})}
class MaaSClient(Service):
    def __init__(self, ak, sk):
        credentials = Credentials.Credentials(ak=ak, sk=sk, service="ml_maas", region="cn-beijing")
        self.service_info = ServiceInfo(maas_host, {"Accept": "application/json"}, credentials, 60, 60, "https")
        self.api_info = API_INFOS
        super().__init__(self.service_info, self.api_info)
client = MaaSClient(os.getenv("VOLC_ACCESSKEY"), os.getenv("VOLC_SECRETKEY"))
class ChatGLM(LLM):
    @property
    def _llm_type(self) -> str:
        return "chatglm"
    def _construct_query(self, prompt: str) -> str:
        return "human_input is: " + prompt
    @classmethod
    def _post(cls, query: dict) -> any:
        request = {"model": {"name": "chatglm-130b"}, "parameters": {"max_tokens": 2000, "temperature": 0.8}, "messages": [{"role": "user", "content": query}]}
        resp = client.json(api=api_chat, params={}, body=json.dumps(request))
        return resp
    def _call(self, prompt: str, stop: list = None) -> str:
        query = self._construct_query(prompt)
        resp = self._post(query=query)
        return resp

Data Ingestion

Load a web dataset with LangChain, generate 768‑dimensional embeddings, and write them to the ESCloud vector index.

from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
embeddings = HuggingFaceEmbeddings()
vectorstore = OpenSearchVectorSearch.from_documents(
    documents=all_splits,
    embedding=HuggingFaceEmbeddings(),
    opensearch_url="URL",
    http_auth=("user", "password"),
    verify_certs=False,
    ssl_assert_hostname=False,
    index_name="langchain_faq",
    vector_field="message_embedding",
    text_field="message",
    metadata_field="message_metadata",
    ssl_show_warn=False
)

Query and Retriever

query = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(query, vector_field="message_embedding", text_field="message", metadata_field="message_metadata")
retriever = vectorstore.as_retriever(search_kwargs={"vector_field": "message_embedding", "text_field": "message", "metadata_field": "message_metadata"})

LLM Chat

from langchain.chains import RetrievalQA
llm = ChatGLM()
qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever)
result = qa_chain({"query": query})

The prompt shown during debugging combines the retrieved context with the user query and is sent to the LLM to generate the final answer.

Conclusion

This practice demonstrates how to build a dedicated intelligent Q&A system using Volcano Engine Cloud Search and the Ark platform, leveraging embeddings, vector search, and LangChain to integrate LLMs for domain‑specific knowledge retrieval.

LLMLangChainEmbeddingprompt tuningVector StoreAI Q&A
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.