Build a Knowledge‑Enhanced LLM Chatbot with Alibaba Cloud PAI: A Step‑by‑Step RAG Guide

This comprehensive guide walks AI developers through building a Retrieval‑Augmented Generation (RAG) chatbot on Alibaba Cloud PAI, covering architecture, vector store setup, model deployment, knowledge ingestion, multi‑modal retrieval, fusion, re‑ranking, prompt design, and end‑to‑end configuration with code examples.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Build a Knowledge‑Enhanced LLM Chatbot with Alibaba Cloud PAI: A Step‑by‑Step RAG Guide

Background

Large language models (LLMs) struggle with accurate, up‑to‑date answers; Retrieval‑Augmented Generation (RAG) combines LLM inference with external vector‑store retrieval to improve reliability for question answering, summarization and other NLP tasks.

Solution Architecture

Key Modules

Knowledge ingestion

Query rewriting

Multi‑path retrieval

Recall fusion

Result re‑ranking

Prompt engineering

Step 1: Prepare Vector Store

Select one of Faiss, Hologres, AnalyticDB PostgreSQL, or Elasticsearch as the vector database, create the instance, configure VPC, and record connection credentials.

Hologres

Open a Hologres instance, create a database, and save host, port, database name, user and password.

AnalyticDB for PostgreSQL

Create an AnalyticDB instance, enable vector engine, and obtain internal and external connection addresses.

Elasticsearch

Create an Elasticsearch instance (general commercial edition), note the private address, port, user and password.

Faiss

Build a local Faiss index without purchasing cloud resources.

Step 2: Deploy Model Service

Use PAI‑EAS to deploy a model serving image (e.g., chatglm2‑6b, Qwen‑7B‑Chat, Llama2‑7B, Llama2‑13B). Set service name, choose the image, configure port 8000, select a public resource group, and choose GPU instance type ml.gu7i.c16m60.1‑gu30 for optimal cost‑performance.

Step 3: Deploy RAG Service

Deploy the chatbot‑langchain image (latest version) on EAS, configure service name, image version, port 8000, GPU resource group, and align VPC with the chosen vector store.

Step 4: Knowledge Upload & Processing

In the WebUI Settings tab, select an embedding model (recommended SGPT‑125M‑weightedmean‑nli‑bitfit ). Upload HTML or TEXT documents; the system performs data cleaning, hyperlink replacement, and semantic chunking (default rank label h2 or configurable chunk size/overlap). Optional QA extraction (e.g., RefGPT) can generate question‑answer pairs for higher signal‑to‑noise retrieval.

Step 5: Retrieval and Fusion

Choose a retrieval strategy:

Vector Store : direct similarity search in the vector database.

Keyword Retrieval : BM25/TF‑IDF sparse search for domains with scarce data.

Keyword Ensembled : enable both; the system applies Reciprocal Rank Fusion (RRF) to merge results.

Optionally enable Re‑Rank with cross‑encoder models such as Cohere‑rerank , BAAI/bge‑reranker‑base , or BAAI/bge‑reranker‑large to improve top‑K relevance.

Step 6: Prompt Construction and Answer Generation

Assemble retrieved documents, a prompt template, and the user query. The default “Simple” prompt can be customized; best practice is to order reference documents → prompt template → user query and explicitly forbid fabricated content.

Configuration JSON Examples

{
  "embedding": {
    "model_dir": "embedding_model/",
    "embedding_model": "SGPT-125M-weightedmean-nli-bitfit",
    "embedding_dimension": 768
  },
  "EASCfg": {
    "url": "http://xx.pai-eas.aliyuncs.com/api/predict/chatllm_demo_glm2",
    "token": "xxxxxxx=="
  },
  "vector_store": "Hologres",
  "HOLOCfg": {
    "PG_HOST": "hgpostcn-cn.xxxxxx.vpc.hologres.aliyuncs.com",
    "PG_PORT": "80",
    "PG_DATABASE": "langchain",
    "PG_USER": "user",
    "PG_PASSWORD": "password"
  }
}

The guide includes links to the official best‑practice documentation, the open‑source PAI‑RAG GitHub repository, and step‑by‑step screenshots for each operation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMRAGChatbotAlibaba CloudPAI
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.