Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform
The paper describes building a Retrieval‑Augmented Generation assistant for the Dewu Open Platform that leverages GPT‑4o‑mini, OpenAI embeddings, Milvus vector store, and LangChain.js to semantically retrieve API documentation, structure user queries, and generate accurate, JSON‑formatted answers, thereby reducing manual support and hallucinations.
Background
Dewu Open Platform provides developers with APIs, solution documents, permission packages, and business documentation. Existing search only matches API paths or names, making it hard for users to find answers across scattered pages, leading to user anxiety and high manual support demand.
Introduction to RAG
RAG (Retrieval‑Augmented Generation) enhances LLM accuracy by retrieving relevant external knowledge before generation, reducing hallucinations and enabling practical Q&A applications.
RAG Core Components
External knowledge base
Embedding model
Vector database
Retriever
Generator (LLM)
Prompt engineering
Standard RAG Workflow
Convert the query to an embedding.
Perform semantic search in the document collection.
Pass retrieved documents to the LLM.
Extract the final answer from the generated text.
Implementation Goal
Reduce manual support by building a RAG‑based assistant that answers questions using the platform’s documentation.
Technical Stack
LLM: GPT‑4o‑mini Embedding model: OpenAI embeddings Vector store: Milvus Framework: LangChain.js (Runnable design)
import { ChatOpenAI } from '@langchain/openai'; import { StringOutputParser } from '@langchain/core/output_parsers'; import { RunnableSequence, RunnableMap } from '@langchain/core/runnables'; import { $getPrompt } from './$prompt'; import { zSchema, StructuredInputType } from './schema'; import { n } from 'src/utils/llm/gen-runnable-name'; import { getLLMConfig } from 'src/utils/llm/get-llm-config'; import { getStringifiedJsonSchema } from 'src/utils/llm/get-stringified-json-schema'; const b = n('$structured-input'); const $getStructuredInput = () => { const $model = new ChatOpenAI(getLLMConfig().ChatOpenAIConfig).bind({ response_format: { type: 'json_object' }, }); const $input = RunnableMap.from<{ question: string }>({ schema: () => getStringifiedJsonSchema(zSchema), question: (input) => input.question, }).bind({ runName: b('map') }); const $prompt = $getPrompt(); const $parser = new StringOutputParser(); return RunnableSequence.from<{ question: string }, string>([ $input.bind({ runName: b('map') }), $prompt.bind({ runName: b('prompt') }), $model, $parser.bind({ runName: b('parser') }), ]).bind({ runName: b('chain') }); }; export { $getStructuredInput, type StructuredInputType };
Accuracy Considerations
Two main points: (1) block non‑platform questions; (2) identify and mitigate situations that cause the model to answer incorrectly, such as vague prompts, fragmented context, or insufficient context connectivity.
User Question Structuring
Use a Runnable to classify and extract precise information from user queries before passing them to the LLM.
const hbsTemplate = `--- 服务ID (serviceId): {{ service.id }} 接口ID (apiId): {{ apiId }} 接口名称 (apiName): {{ apiName }} 接口地址 (apiUrl): {{ apiUrl }} 页面地址 (pageUrl): {{ pageUrl }} --- # {{ title }} {{ paragraph }}`; export const processIntoEmbeddings = (data: CombinedApiDoc) => { const template = baseTemplate(data); const texts = [ template(requestHeader(data)), template(requestUrl(data)), template(publicRequestParam(data)), template(requestParam(data)), template(responseParam(data)), template(errorCodes(data)), template(authPackage(data)), ].filter(Boolean) as string[][]; return flattenDeep(texts).map((content) => { return new Document ({ metadata: { serviceId: data.service.id, apiId: data.apiId!, apiName: data.apiName!, apiUrl: data.apiUrl!, pageUrl: data.pageUrl!, }, pageContent: content!, }); }); };
CO‑STAR Prompt Structure
CO‑STAR (Context, Objective, Style, Tone, Audience, Response) guides the LLM to produce relevant, well‑formatted answers.
# CONTEXT 得物的开放平台是一个包含着 API 文档,解决方案文档的平台,商家可以通过这个平台获取到得物的各种接口,以及解决方案,帮助商家更好的使用得物的服务。 # OBJECTIVE 你需要根据用户的输入,以及提供的得物开放平台的文档上下文,进行答疑。 # STYLE 请以简洁明了的方式回答,确保信息易于理解。 # TONE 温柔甜美但严谨,先进行自我介绍。 # AUDIENCE 得物开放平台的开发者。 # RESPONSE 返回符合提供的 JSON Schema 的结构化数据。
Similarity Search
Retrieve top‑K (K=5) most similar documents using cosine similarity in Milvus.
import { Milvus } from '@langchain/community/vectorstores/milvus'; import { OpenAIEmbeddings } from '@langchain/openai'; import { RunnableSequence } from '@langchain/core/runnables'; import { getLLMConfig } from 'src/utils/llm/get-llm-config'; export const $getContext = async () => { const embeddings = new OpenAIEmbeddings(getLLMConfig().OpenAIEmbeddingsConfig); const vectorStore = await Milvus.fromExistingCollection(embeddings, { collectionName: 'open_rag' }); return RunnableSequence.from([ (input) => input.question, vectorStore.asRetriever(5), ]); };
Answer Generation
Combine user question, structured input, retrieved context, and a prompt to generate a JSON‑formatted answer.
import { ChatOpenAI } from '@langchain/openai'; import { $getPrompt } from './prompt/index'; import { JsonOutputParser } from '@langchain/core/output_parsers'; import { RunnableSequence, RunnableMap } from '@langchain/core/runnables'; import { zOutputSchema } from './schema'; import { $getContext } from './retriever/index'; import { getLLMConfig } from 'src/utils/llm/get-llm-config'; import { getStringifiedJsonSchema } from 'src/utils/llm/get-stringified-json-schema'; import { n } from 'src/utils/llm/gen-runnable-name'; const b = n('$open-rag'); type OpenRagInput = { structuredInput: string; question: string }; export const $getOpenRag = async () => { const $model = new ChatOpenAI(getLLMConfig().ChatOpenAIConfig).bind({ response_format: { type: 'json_object' }, }); const chain = RunnableSequence.from([ RunnableMap.from ({ context: await $getContext(), structuredInput: (i) => i.structuredInput, question: (i) => i.question, strcuturedOutputSchema: () => getStringifiedJsonSchema(zOutputSchema), }).bind({ runName: b('runnable-map') }), $getPrompt().bind({ runName: b('prompt') }), $model, new JsonOutputParser(), ]).bind({ runName: b('chain') }); return chain; }; export { $getOpenRag };
Future Outlook
RAG reduces hallucinations and provides up‑to‑date content without retraining. Deploying a RAG assistant on the Dewu Open Platform demonstrates the feasibility of knowledge‑base‑driven Q&A and offers insights for internal systems to lower support costs.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.