How to Build a Retrieval‑Augmented Generation (RAG) System with Alibaba Cloud Milvus and PAI
This guide walks you through setting up Alibaba Cloud Milvus, configuring public access, deploying a RAG system via PAI, uploading a knowledge base, interacting with the model through the Web UI, and inspecting vector collections with Attu, all with step‑by‑step instructions and configuration details.
Background
Alibaba Cloud Milvus (vector retrieval service) is a fully managed, 100% compatible version of open‑source Milvus that adds scalability, ease of use, security, low cost, and ecosystem integration, making it ideal for large‑scale AI vector similarity search in scenarios such as multimodal search, RAG, recommendation, and content risk detection.
Prerequisites
A Milvus instance with public network access enabled.
An activated PAI (EAS) workspace and default workspace created.
Usage Limits
Milvus instances and PAI (EAS) must reside in the same region.
Operation Process
Step 1: Deploy RAG System via PAI
Log in to the PAI console, select the workspace, and navigate to Model Deployment → Model Online Service (EAS).
Click “Deploy Service” and choose the large‑model RAG dialogue system.
Configure key parameters (service name, model source, model type, instance count, GPU resources, Milvus connection details, VPC, subnet, security group). Use default values where appropriate.
Parameter
Description
Service Name
Customizable name.
Model Source
Default open‑source public model.
Model Type
Example uses Qwen1.5‑1.8b.
Instance Count
Default 1.
GPU Resource
Select as needed, e.g., ml.gu7i.c16m30.1‑gu30.
Milvus Version
Choose Milvus.
Milvus Address
Internal address of the Milvus instance.
Proxy Port
Proxy port of the Milvus instance.
Account
Set to root.
Password
Root password set during Milvus creation.
Database Name
Usually "default"; can create others.
Collection Name
New or existing collection matching RAG requirements.
Step 2: Upload Knowledge Base via RAG WebUI
Open the WebUI from the Model Online Service page and configure the embedding model (model name and dimension are auto‑filled).
Test the Milvus connection by clicking “Connect Milvus”.
In the Upload tab, set semantic chunk parameters:
Parameter
Description
Chunk Size
Size of each chunk in bytes (default 500).
Chunk Overlap
Overlap between adjacent chunks (default 10).
Process with QA Extraction Model
Enable to automatically extract QA pairs from uploaded documents.
Upload files (e.g., poems.txt) in the Files tab, then click Upload. The system cleans the data, performs semantic chunking, and stores it.
Step 3: Interact with the RAG System via WebUI Chat
Select a Prompt strategy in the Chat tab: LLM only, Retrieval only, or combined Retrieval‑plus‑LLM (RAG). Choose the desired LLM and start a conversation to receive model answers.
Step 4: View Knowledge Base Chunks with Attu
Use the Attu graphical tool (open via the Milvus console) to inspect the automatically created collection, view stored vectors, and verify chunking.
Related Information
For more details about Milvus, see the official documentation at https://x.sm.cn/FNXU3m7.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
