How to Build a Fast, Accurate AI‑Powered Knowledge Base with Amazon OpenSearch and DeepSeek
This article walks through using Amazon OpenSearch Service’s vector search and ML connector together with the DeepSeek large language model to create a low‑cost, high‑efficiency enterprise knowledge base, covering architecture, step‑by‑step deployment, RAG pipeline configuration, and conversational search extensions.
Problem Statement
Enterprise knowledge bases often suffer from information silos and low retrieval efficiency, limiting the usefulness of natural‑language queries.
Technical Solution
Combine Amazon OpenSearch Service vector indexing with the open‑source DeepSeek large language model (LLM) to implement Retrieval‑Augmented Generation (RAG). OpenSearch 2.11 introduces a built‑in RAG processor that can invoke external models via the ML Connector, perform vector similarity search, and pass retrieved snippets to an LLM for answer generation.
Key Components
ML Connector : registers a DeepSeek model and an embedding model as remote SageMaker endpoints.
Ingestion Pipeline : loads documents into an OpenSearch index and vectorizes them using the embedding model.
Search Pipeline (RAG processor) : defined as my-conversation-search-pipeline-deepseek-zh, executes a neural query, retrieves top‑k documents, builds a prompt, and calls the DeepSeek LLM.
Implementation Steps
Register the DeepSeek LLM and the embedding model with OpenSearch using the ML Connector and link them to SageMaker endpoints.
Create an Ingestion Pipeline that reads source documents, invokes the embedding connector, and stores the vectors in an index (e.g., opensearch_kl_index).
Configure the Search Pipeline to use the neural query type, specify the embedding model ID, and set the LLM parameters.
Clients send queries to the index via the Search Pipeline; the pipeline vectorizes the query, retrieves relevant documents, and forwards the combined context to DeepSeek.
RAG Processor Example
GET _plugins/_ml/models/<DeepSeekModelId>/_predict
{
"parameters": {
"inputs": "OpenSearch Serverless 是什么,和 OpenSearch 集群模式有什么区别,使用 OpenSearch Serverless,还需要管理服务器资源么?"
}
}Knowledge‑base query using the RAG processor:
GET opensearch_kl_index/_search?search_pipeline=my-conversation-search-pipeline-deepseek-zh
{
"query": {
"neural": {
"text_embedding": {
"query_text": "OpenSearch Serverless 是什么,和 OpenSearch 集群模式有什么区别,使用 OpenSearch Serverless,还需要管理服务器资源么?",
"model_id": "<Embedding Model ID>",
"k": 5
}
}
},
"size": 2,
"_source": ["text"],
"ext": {
"generative_qa_parameters": {
"llm_model": "bedrock/claude",
"llm_question": "OpenSearch Serverless 是什么,和 OpenSearch 集群模式有什么区别,使用 OpenSearch Serverless,还需要管理服务器资源么?",
"context_size": 5,
"timeout": 100
}
}
}The response contains two parts: (1) the retrieved document snippets and (2) the LLM‑generated answer that references those snippets.
Deployment Options
Notebook deployment : run the Jupyter notebook RAG-DeepSeek-AOS-deploy.ipynb in SageMaker Studio. Notebook URL: https://gitee.com/turk/opensearch-ds-rag-demo/blob/069a3a18c9b49f5cbf0e7f79b21b40394aea51b6/opensearch-rag/notebook/RAG-DeepSeek-AOS-deploy.ipynb
CloudFormation deployment : use the template opensearch-ml-connector-cf-fixed.yaml and the Python‑dependency zip to provision IAM roles, ML connectors, model registrations, a pre‑populated index, and the RAG search pipeline. Template URL: https://gitee.com/turk/opensearch-ds-rag-demo/blob/master/opensearch-rag/cfn/opensearch-ml-connector-cf-fixed.yaml<br/>Dependencies zip URL: https://gitee.com/turk/opensearch-ds-rag-demo/raw/master/opensearch-rag/cfn/python-dependencies.zip
CloudFormation Output (Key Resources)
Model IDs for the DeepSeek LLM and the embedding model.
Knowledge‑base index opensearch_kl_index with sample data.
Search Pipeline my-conversation-search-pipeline-deepseek-zh.
Verification Commands
POST _plugins/_ml/models/<DeepSeekModelId>/_predict
{
"parameters": {"inputs": "..."}
} GET /_search/pipeline/my-conversation-search-pipeline-deepseek-zh GET opensearch_kl_index/_search?search_pipeline=my-conversation-search-pipeline-deepseek-zh
{...}Demo Application
Clone the repository and set up a Python virtual environment:
git clone https://gitee.com/turk/opensearch-ds-rag-demo.git
cd opensearch-ds-rag-demo/opensearch-rag/qa_app
python3 -m venv .
source bin/activate
pip install -r requirements.txt
cp .env.example .envConfigure .env with the OpenSearch endpoint, credentials, index name, and model IDs:
OPENSEARCH_HOST=<opensearch-endpoint>
OPENSEARCH_PORT=443
OPENSEARCH_USER=<username>
OPENSEARCH_PASSWORD=<password>
OPENSEARCH_INDEX=opensearch_kl_index
OPENSEARCH_EMBEDDING_MODEL_ID=<embedding-model-id>Run the single‑turn Q&A UI: python3 app.py For multi‑turn conversational search, create a memory store:
POST /_plugins/_ml/memory/
{ "name": "Conversation about DeepSeek Demo" }Use the returned memory_id in .env (e.g., OPENSEARCH_MEMORY_ID=<memory-id>) and start the conversational demo: python3 app_conversational.py Example conversation:
Question: "OpenSearch 有支持收集日志的工具么?"
Follow‑up: "这个工具能收集哪些日志?"
Retrieve the stored messages:
GET /_plugins/_ml/memory/<memory-id>/messagesSummary of Technical Benefits
Vector indexing enables semantic similarity matching, overcoming keyword‑only limitations.
The RAG processor integrates retrieval and generation in a single search pipeline, reducing latency.
ML Connector abstracts model invocation, eliminating custom code for embedding and LLM calls.
Memory API provides context retention for multi‑turn dialogues.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Amazon Cloud Developers
Official technical community of Amazon Cloud. Shares practical AI/ML, big data, database, modern app development, IoT content, offers comprehensive learning resources, hosts regular developer events, and continuously empowers developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
