Deploy a High‑Performance RAG Service with Hologres, DeepSeek, and PAI‑EAS
This guide walks you through building a Retrieval‑Augmented Generation (RAG) system by integrating Alibaba Cloud's Hologres vector store, the Proxima high‑performance vector engine, and DeepSeek large language models via PAI‑EAS, covering prerequisites, deployment steps, configuration, and inference verification.
Background
Hologres is Alibaba's real‑time data warehouse that supports massive OLAP, low‑latency serving, and deep integration with the Proxima high‑performance vector computation library, enabling fast, simple vector operations.
PAI‑EAS (Elastic Algorithm Service) on Alibaba Cloud AI Platform provides a one‑click deployment mode for large language models (LLM) and Retrieval‑Augmented Generation (RAG) services, dramatically shortening deployment time and improving answer quality for QA, summarization, and other NLP tasks.
DeepSeek is a MoE‑based LLM offering efficient inference and retrieval capabilities, now available for one‑click deployment through PAI‑EAS.
RAG Overview
RAG combines external knowledge bases with LLMs to overcome LLM limitations such as domain knowledge gaps, outdated information, and hallucinations, delivering more accurate and up‑to‑date responses.
Prerequisites
Create a VPC, switch, and security group; ensure the Hologres instance and RAG service reside in the same VPC.
Deployment Steps
Step 1 – Prepare Hologres Vector Store
Create a Hologres instance.
Create a database and user account, grant appropriate permissions (developer or higher), and verify via HoloWeb.
Configure the database connection endpoint (host:port) from the instance details page.
Step 2 – Deploy DeepSeek‑Based RAG Service
Log in to the PAI console, select the workspace, and navigate to Model Deployment > Model Online Service (EAS) .
Choose the deployment mode (LLM‑integrated or LLM‑separate) and select DeepSeek as the model.
Configure basic information, version selection (LLM‑integrated or LLM‑separate), model category, and resource specifications.
Set vector store type to Hologres and provide VPC host, database name, user, password, and table name (new or existing).
Step 3 – Verify Model Inference via WebUI
Open the WebUI from the service list and adjust settings such as embedding type, dimension, batch size, and multimodal options.
Upload business data files (txt, pdf, excel, docx, markdown, html) and configure chunk size, overlap, OCR, and multimodal processing.
Configure inference parameters (streaming output, citation requirement, temperature, retrieval mode, etc.) in the Chat tab.
Step 4 – API‑Based Inference Validation
Obtain the RAG service’s public endpoint and token from the service details page.
Refer to the API documentation to call the service programmatically.
Key Features of Hologres Vector Store
Hologres offers high‑performance, low‑latency vector computation, supporting efficient similarity search and seamless integration with the RAG system.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
