Artificial Intelligence 28 min read

Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform

The paper describes building a Retrieval‑Augmented Generation assistant for the Dewu Open Platform that leverages GPT‑4o‑mini, OpenAI embeddings, Milvus vector store, and LangChain.js to semantically retrieve API documentation, structure user queries, and generate accurate, JSON‑formatted answers, thereby reducing manual support and hallucinations.

DeWu Technology

Jan 6, 2025

Design and Implementation of a Retrieval‑Augmented Generation (RAG) Answering Assistant for the Dewu Open Platform

Background

Dewu Open Platform provides developers with APIs, solution documents, permission packages, and business documentation. Existing search only matches API paths or names, making it hard for users to find answers across scattered pages, leading to user anxiety and high manual support demand.

Introduction to RAG

RAG (Retrieval‑Augmented Generation) enhances LLM accuracy by retrieving relevant external knowledge before generation, reducing hallucinations and enabling practical Q&A applications.

RAG Core Components

External knowledge base

Embedding model

Vector database

Retriever

Generator (LLM)

Prompt engineering

Standard RAG Workflow

Convert the query to an embedding.

Perform semantic search in the document collection.

Pass retrieved documents to the LLM.

Extract the final answer from the generated text.

Implementation Goal

Reduce manual support by building a RAG‑based assistant that answers questions using the platform’s documentation.

Technical Stack

LLM: GPT‑4o‑mini Embedding model: OpenAI embeddings Vector store: Milvus Framework: LangChain.js (Runnable design)

import { ChatOpenAI } from '@langchain/openai';
import { StringOutputParser } from '@langchain/core/output_parsers';
import { RunnableSequence, RunnableMap } from '@langchain/core/runnables';
import { $getPrompt } from './$prompt';
import { zSchema, StructuredInputType } from './schema';
import { n } from 'src/utils/llm/gen-runnable-name';
import { getLLMConfig } from 'src/utils/llm/get-llm-config';
import { getStringifiedJsonSchema } from 'src/utils/llm/get-stringified-json-schema';

const b = n('$structured-input');

const $getStructuredInput = () => {
  const $model = new ChatOpenAI(getLLMConfig().ChatOpenAIConfig).bind({
    response_format: { type: 'json_object' },
  });
  const $input = RunnableMap.from<{ question: string }>({
    schema: () => getStringifiedJsonSchema(zSchema),
    question: (input) => input.question,
  }).bind({ runName: b('map') });
  const $prompt = $getPrompt();
  const $parser = new StringOutputParser();
  return RunnableSequence.from<{ question: string }, string>([
    $input.bind({ runName: b('map') }),
    $prompt.bind({ runName: b('prompt') }),
    $model,
    $parser.bind({ runName: b('parser') }),
  ]).bind({ runName: b('chain') });
};

export { $getStructuredInput, type StructuredInputType };

Accuracy Considerations

Two main points: (1) block non‑platform questions; (2) identify and mitigate situations that cause the model to answer incorrectly, such as vague prompts, fragmented context, or insufficient context connectivity.

User Question Structuring

Use a Runnable to classify and extract precise information from user queries before passing them to the LLM.

const hbsTemplate = `---
服务ID (serviceId): {{ service.id }}
接口ID (apiId): {{ apiId }}
接口名称 (apiName): {{ apiName }}
接口地址 (apiUrl): {{ apiUrl }}
页面地址 (pageUrl): {{ pageUrl }}
---

# {{ title }}

{{ paragraph }}`;
export const processIntoEmbeddings = (data: CombinedApiDoc) => {
  const template = baseTemplate(data);
  const texts = [
    template(requestHeader(data)),
    template(requestUrl(data)),
    template(publicRequestParam(data)),
    template(requestParam(data)),
    template(responseParam(data)),
    template(errorCodes(data)),
    template(authPackage(data)),
  ].filter(Boolean) as string[][];
  return flattenDeep(texts).map((content) => {
    return new Document<MetaData>({
      metadata: {
        serviceId: data.service.id,
        apiId: data.apiId!,
        apiName: data.apiName!,
        apiUrl: data.apiUrl!,
        pageUrl: data.pageUrl!,
      },
      pageContent: content!,
    });
  });
};

CO‑STAR Prompt Structure

CO‑STAR (Context, Objective, Style, Tone, Audience, Response) guides the LLM to produce relevant, well‑formatted answers.

# CONTEXT
得物的开放平台是一个包含着 API 文档，解决方案文档的平台，商家可以通过这个平台获取到得物的各种接口，以及解决方案，帮助商家更好的使用得物的服务。

# OBJECTIVE
你需要根据用户的输入，以及提供的得物开放平台的文档上下文，进行答疑。

# STYLE
请以简洁明了的方式回答，确保信息易于理解。

# TONE
温柔甜美但严谨，先进行自我介绍。

# AUDIENCE
得物开放平台的开发者。

# RESPONSE
返回符合提供的 JSON Schema 的结构化数据。

Similarity Search

Retrieve top‑K (K=5) most similar documents using cosine similarity in Milvus.

import { Milvus } from '@langchain/community/vectorstores/milvus';
import { OpenAIEmbeddings } from '@langchain/openai';
import { RunnableSequence } from '@langchain/core/runnables';
import { getLLMConfig } from 'src/utils/llm/get-llm-config';

export const $getContext = async () => {
  const embeddings = new OpenAIEmbeddings(getLLMConfig().OpenAIEmbeddingsConfig);
  const vectorStore = await Milvus.fromExistingCollection(embeddings, { collectionName: 'open_rag' });
  return RunnableSequence.from([
    (input) => input.question,
    vectorStore.asRetriever(5),
  ]);
};

Answer Generation

Combine user question, structured input, retrieved context, and a prompt to generate a JSON‑formatted answer.

import { ChatOpenAI } from '@langchain/openai';
import { $getPrompt } from './prompt/index';
import { JsonOutputParser } from '@langchain/core/output_parsers';
import { RunnableSequence, RunnableMap } from '@langchain/core/runnables';
import { zOutputSchema } from './schema';
import { $getContext } from './retriever/index';
import { getLLMConfig } from 'src/utils/llm/get-llm-config';
import { getStringifiedJsonSchema } from 'src/utils/llm/get-stringified-json-schema';
import { n } from 'src/utils/llm/gen-runnable-name';

const b = n('$open-rag');

type OpenRagInput = { structuredInput: string; question: string };

export const $getOpenRag = async () => {
  const $model = new ChatOpenAI(getLLMConfig().ChatOpenAIConfig).bind({
    response_format: { type: 'json_object' },
  });
  const chain = RunnableSequence.from([
    RunnableMap.from<OpenRagInput>({
      context: await $getContext(),
      structuredInput: (i) => i.structuredInput,
      question: (i) => i.question,
      strcuturedOutputSchema: () => getStringifiedJsonSchema(zOutputSchema),
    }).bind({ runName: b('runnable-map') }),
    $getPrompt().bind({ runName: b('prompt') }),
    $model,
    new JsonOutputParser(),
  ]).bind({ runName: b('chain') });
  return chain;
};

export { $getOpenRag };

Future Outlook

RAG reduces hallucinations and provides up‑to‑date content without retraining. Deploying a RAG assistant on the Dewu Open Platform demonstrates the feasibility of knowledge‑base‑driven Q&A and offers insights for internal systems to lower support costs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LLM LangChain RAG vector database

Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.