Build a Custom AI Q&A System Using Volcano Engine Cloud Search & LangChain
This guide explains how to create a domain‑specific intelligent Q&A system by leveraging prompt‑tuning, Volcano Engine Cloud Search vector store, and LangChain, covering model selection, data embedding, vector indexing, retrieval, and LLM integration with full code examples.
Introduction
With the rise of large language models (LLMs), generative AI is valuable for tasks such as image generation, document writing, and information retrieval. To apply LLMs in vertical domains, knowledge bases must be incorporated for training or inference.
Two common approaches are fine‑tuning (high cost, low timeliness) and prompt‑tuning (flexible and low cost). This article uses prompt‑tuning to build a custom intelligent Q&A system with Volcano Engine Cloud Search and the Ark platform.
Setup
1. Log in to Volcano Engine Cloud Search, create an instance cluster, and select version 7.10.
2. Choose an appropriate model from the Ark platform model marketplace and review its API documentation.
Mapping Preparation
PUT langchain_faq
{
"mappings": {
"properties": {
"message": { "type": "text" },
"message_embedding": { "type": "knn_vector", "dimension": 768 },
"metadata": { "type": "text" }
}
},
"settings": {
"index": {
"refresh_interval": "10s",
"number_of_shards": "3",
"knn": true,
"knn.space_type": "cosinesimil",
"number_of_replicas": "1"
}
}
}Client Preparation
Install dependencies:
pip install volcengine --user
pip install langchain --userInitialize components:
# Embedding
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()
# VectorStore
from langchain.vectorstores import OpenSearchVectorSearch
# LLM Base
from langchain.llms.base import LLM
# Document loader
from langchain.document_loaders import WebBaseLoader
# LLM Cache
from langchain.cache import InMemoryCache
llm_cache = InMemoryCache()MaaS Preparation
maas_host = "maas-api.ml-platform-cn-beijing.volces.com"
api_chat = "chat"
API_INFOS = {api_chat: ApiInfo("POST", "/api/v1/" + api_chat, {}, {}, {})}
class MaaSClient(Service):
def __init__(self, ak, sk):
credentials = Credentials.Credentials(ak=ak, sk=sk, service="ml_maas", region="cn-beijing")
self.service_info = ServiceInfo(maas_host, {"Accept": "application/json"}, credentials, 60, 60, "https")
self.api_info = API_INFOS
super().__init__(self.service_info, self.api_info)
client = MaaSClient(os.getenv("VOLC_ACCESSKEY"), os.getenv("VOLC_SECRETKEY"))
class ChatGLM(LLM):
@property
def _llm_type(self) -> str:
return "chatglm"
def _construct_query(self, prompt: str) -> str:
return "human_input is: " + prompt
@classmethod
def _post(cls, query: dict) -> any:
request = {"model": {"name": "chatglm-130b"}, "parameters": {"max_tokens": 2000, "temperature": 0.8}, "messages": [{"role": "user", "content": query}]}
resp = client.json(api=api_chat, params={}, body=json.dumps(request))
return resp
def _call(self, prompt: str, stop: list = None) -> str:
query = self._construct_query(prompt)
resp = self._post(query=query)
return respData Ingestion
Load a web dataset with LangChain, generate 768‑dimensional embeddings, and write them to the ESCloud vector index.
from langchain.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
embeddings = HuggingFaceEmbeddings()
vectorstore = OpenSearchVectorSearch.from_documents(
documents=all_splits,
embedding=HuggingFaceEmbeddings(),
opensearch_url="URL",
http_auth=("user", "password"),
verify_certs=False,
ssl_assert_hostname=False,
index_name="langchain_faq",
vector_field="message_embedding",
text_field="message",
metadata_field="message_metadata",
ssl_show_warn=False
)Query and Retriever
query = "What are the approaches to Task Decomposition?"
docs = vectorstore.similarity_search(query, vector_field="message_embedding", text_field="message", metadata_field="message_metadata")
retriever = vectorstore.as_retriever(search_kwargs={"vector_field": "message_embedding", "text_field": "message", "metadata_field": "message_metadata"})LLM Chat
from langchain.chains import RetrievalQA
llm = ChatGLM()
qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever)
result = qa_chain({"query": query})The prompt shown during debugging combines the retrieved context with the user query and is sent to the LLM to generate the final answer.
Conclusion
This practice demonstrates how to build a dedicated intelligent Q&A system using Volcano Engine Cloud Search and the Ark platform, leveraging embeddings, vector search, and LangChain to integrate LLMs for domain‑specific knowledge retrieval.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
