Artificial Intelligence 12 min read

Deploy a High‑Performance RAG Service with Hologres, DeepSeek, and PAI‑EAS

This guide walks you through building a Retrieval‑Augmented Generation (RAG) system by integrating Alibaba Cloud's Hologres vector store, the Proxima high‑performance vector engine, and DeepSeek large language models via PAI‑EAS, covering prerequisites, deployment steps, configuration, and inference verification.

Alibaba Cloud Big Data AI Platform

Mar 4, 2025

Deploy a High‑Performance RAG Service with Hologres, DeepSeek, and PAI‑EAS

Background

Hologres is Alibaba's real‑time data warehouse that supports massive OLAP, low‑latency serving, and deep integration with the Proxima high‑performance vector computation library, enabling fast, simple vector operations.

PAI‑EAS (Elastic Algorithm Service) on Alibaba Cloud AI Platform provides a one‑click deployment mode for large language models (LLM) and Retrieval‑Augmented Generation (RAG) services, dramatically shortening deployment time and improving answer quality for QA, summarization, and other NLP tasks.

DeepSeek is a MoE‑based LLM offering efficient inference and retrieval capabilities, now available for one‑click deployment through PAI‑EAS.

RAG Overview

RAG combines external knowledge bases with LLMs to overcome LLM limitations such as domain knowledge gaps, outdated information, and hallucinations, delivering more accurate and up‑to‑date responses.

Prerequisites

Create a VPC, switch, and security group; ensure the Hologres instance and RAG service reside in the same VPC.

Deployment Steps

Step 1 – Prepare Hologres Vector Store

Create a Hologres instance.

Create a database and user account, grant appropriate permissions (developer or higher), and verify via HoloWeb.

Configure the database connection endpoint (host:port) from the instance details page.

Step 2 – Deploy DeepSeek‑Based RAG Service

Choose the deployment mode (LLM‑integrated or LLM‑separate) and select DeepSeek as the model.

Configure basic information, version selection (LLM‑integrated or LLM‑separate), model category, and resource specifications.

Set vector store type to Hologres and provide VPC host, database name, user, password, and table name (new or existing).

Step 3 – Verify Model Inference via WebUI

Open the WebUI from the service list and adjust settings such as embedding type, dimension, batch size, and multimodal options.

Upload business data files (txt, pdf, excel, docx, markdown, html) and configure chunk size, overlap, OCR, and multimodal processing.

Configure inference parameters (streaming output, citation requirement, temperature, retrieval mode, etc.) in the Chat tab.

Step 4 – API‑Based Inference Validation

Obtain the RAG service’s public endpoint and token from the service details page.

Refer to the API documentation to call the service programmatically.

Key Features of Hologres Vector Store

Hologres offers high‑performance, low‑latency vector computation, supporting efficient similarity search and seamless integration with the RAG system.

RAG Hologres vector search DeepSeek AI Deployment PAI-EAS

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.