Artificial Intelligence 28 min read

Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform

The paper describes building a Retrieval‑Augmented Generation assistant for the Dewu Open Platform that leverages GPT‑4o‑mini, OpenAI embeddings, Milvus vector store, and LangChain.js to semantically retrieve API documentation, structure user queries, and generate accurate, JSON‑formatted answers, thereby reducing manual support and hallucinations.

DeWu Technology
DeWu Technology
DeWu Technology
Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform

Background

Dewu Open Platform provides developers with APIs, solution documents, permission packages, and business documentation. Existing search only matches API paths or names, making it hard for users to find answers across scattered pages, leading to user anxiety and high manual support demand.

Introduction to RAG

RAG (Retrieval‑Augmented Generation) enhances LLM accuracy by retrieving relevant external knowledge before generation, reducing hallucinations and enabling practical Q&A applications.

RAG Core Components

External knowledge base

Embedding model

Vector database

Retriever

Generator (LLM)

Prompt engineering

Standard RAG Workflow

Convert the query to an embedding.

Perform semantic search in the document collection.

Pass retrieved documents to the LLM.

Extract the final answer from the generated text.

Implementation Goal

Reduce manual support by building a RAG‑based assistant that answers questions using the platform’s documentation.

Technical Stack

LLM: GPT‑4o‑mini Embedding model: OpenAI embeddings Vector store: Milvus Framework: LangChain.js (Runnable design)

import { ChatOpenAI } from '@langchain/openai'; import { StringOutputParser } from '@langchain/core/output_parsers'; import { RunnableSequence, RunnableMap } from '@langchain/core/runnables'; import { $getPrompt } from './$prompt'; import { zSchema, StructuredInputType } from './schema'; import { n } from 'src/utils/llm/gen-runnable-name'; import { getLLMConfig } from 'src/utils/llm/get-llm-config'; import { getStringifiedJsonSchema } from 'src/utils/llm/get-stringified-json-schema'; const b = n('$structured-input'); const $getStructuredInput = () => { const $model = new ChatOpenAI(getLLMConfig().ChatOpenAIConfig).bind({ response_format: { type: 'json_object' }, }); const $input = RunnableMap.from<{ question: string }>({ schema: () => getStringifiedJsonSchema(zSchema), question: (input) => input.question, }).bind({ runName: b('map') }); const $prompt = $getPrompt(); const $parser = new StringOutputParser(); return RunnableSequence.from<{ question: string }, string>([ $input.bind({ runName: b('map') }), $prompt.bind({ runName: b('prompt') }), $model, $parser.bind({ runName: b('parser') }), ]).bind({ runName: b('chain') }); }; export { $getStructuredInput, type StructuredInputType };

Accuracy Considerations

Two main points: (1) block non‑platform questions; (2) identify and mitigate situations that cause the model to answer incorrectly, such as vague prompts, fragmented context, or insufficient context connectivity.

User Question Structuring

Use a Runnable to classify and extract precise information from user queries before passing them to the LLM.

const hbsTemplate = `--- 服务ID (serviceId): {{ service.id }} 接口ID (apiId): {{ apiId }} 接口名称 (apiName): {{ apiName }} 接口地址 (apiUrl): {{ apiUrl }} 页面地址 (pageUrl): {{ pageUrl }} --- # {{ title }} {{ paragraph }}`; export const processIntoEmbeddings = (data: CombinedApiDoc) => { const template = baseTemplate(data); const texts = [ template(requestHeader(data)), template(requestUrl(data)), template(publicRequestParam(data)), template(requestParam(data)), template(responseParam(data)), template(errorCodes(data)), template(authPackage(data)), ].filter(Boolean) as string[][]; return flattenDeep(texts).map((content) => { return new Document ({ metadata: { serviceId: data.service.id, apiId: data.apiId!, apiName: data.apiName!, apiUrl: data.apiUrl!, pageUrl: data.pageUrl!, }, pageContent: content!, }); }); };

CO‑STAR Prompt Structure

CO‑STAR (Context, Objective, Style, Tone, Audience, Response) guides the LLM to produce relevant, well‑formatted answers.

# CONTEXT 得物的开放平台是一个包含着 API 文档,解决方案文档的平台,商家可以通过这个平台获取到得物的各种接口,以及解决方案,帮助商家更好的使用得物的服务。 # OBJECTIVE 你需要根据用户的输入,以及提供的得物开放平台的文档上下文,进行答疑。 # STYLE 请以简洁明了的方式回答,确保信息易于理解。 # TONE 温柔甜美但严谨,先进行自我介绍。 # AUDIENCE 得物开放平台的开发者。 # RESPONSE 返回符合提供的 JSON Schema 的结构化数据。

Similarity Search

Retrieve top‑K (K=5) most similar documents using cosine similarity in Milvus.

import { Milvus } from '@langchain/community/vectorstores/milvus'; import { OpenAIEmbeddings } from '@langchain/openai'; import { RunnableSequence } from '@langchain/core/runnables'; import { getLLMConfig } from 'src/utils/llm/get-llm-config'; export const $getContext = async () => { const embeddings = new OpenAIEmbeddings(getLLMConfig().OpenAIEmbeddingsConfig); const vectorStore = await Milvus.fromExistingCollection(embeddings, { collectionName: 'open_rag' }); return RunnableSequence.from([ (input) => input.question, vectorStore.asRetriever(5), ]); };

Answer Generation

Combine user question, structured input, retrieved context, and a prompt to generate a JSON‑formatted answer.

import { ChatOpenAI } from '@langchain/openai'; import { $getPrompt } from './prompt/index'; import { JsonOutputParser } from '@langchain/core/output_parsers'; import { RunnableSequence, RunnableMap } from '@langchain/core/runnables'; import { zOutputSchema } from './schema'; import { $getContext } from './retriever/index'; import { getLLMConfig } from 'src/utils/llm/get-llm-config'; import { getStringifiedJsonSchema } from 'src/utils/llm/get-stringified-json-schema'; import { n } from 'src/utils/llm/gen-runnable-name'; const b = n('$open-rag'); type OpenRagInput = { structuredInput: string; question: string }; export const $getOpenRag = async () => { const $model = new ChatOpenAI(getLLMConfig().ChatOpenAIConfig).bind({ response_format: { type: 'json_object' }, }); const chain = RunnableSequence.from([ RunnableMap.from ({ context: await $getContext(), structuredInput: (i) => i.structuredInput, question: (i) => i.question, strcuturedOutputSchema: () => getStringifiedJsonSchema(zOutputSchema), }).bind({ runName: b('runnable-map') }), $getPrompt().bind({ runName: b('prompt') }), $model, new JsonOutputParser(), ]).bind({ runName: b('chain') }); return chain; }; export { $getOpenRag };

Future Outlook

RAG reduces hallucinations and provides up‑to‑date content without retraining. Deploying a RAG assistant on the Dewu Open Platform demonstrates the feasibility of knowledge‑base‑driven Q&A and offers insights for internal systems to lower support costs.

AILLMprompt engineeringLangChainRAGvector database
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.