Introduction to Retrieval‑Augmented Generation (RAG) and Vector Indexing with StarRocks and DeepSeek
This article explains the fundamentals of Retrieval‑Augmented Generation, demonstrates how to create and query vector indexes using StarRocks, shows how DeepSeek provides embeddings and answer generation, and walks through a complete end‑to‑end RAG pipeline with code examples and a web UI.
RAG (Retrieval‑Augmented Generation) combines external knowledge retrieval with AI generation to overcome the static knowledge limitation of large language models, delivering more accurate and up‑to‑date answers.
Core RAG Process
Retrieval : After a user asks a question, relevant content is fetched from external sources (e.g., Wikipedia, enterprise documents) using vector databases, search engines, or traditional databases.
Generation : The retrieved information is fed together with the user query into a generation model (e.g., GPT, LLaMA, DeepSeek) to produce a response that is grounded in real data.
The standard workflow includes preprocessing data (images, documents, audio, video) into embeddings, storing them in a vector database, and using ANN algorithms (HNSW, IVF) for efficient similarity search.
Typical RAG Applications with StarRocks + DeepSeek
Enterprise knowledge bases (document search, FAQ)
Domain‑specific Q&A for legal, finance, medical fields
Code search and software documentation queries
Intelligent customer service for banking, e‑commerce, etc.
For each scenario, the solution consists of three steps: (1) document embedding with DeepSeek, (2) storage and indexing in StarRocks (using HNSW or IVFPQ), and (3) RAG‑enhanced generation where DeepSeek consumes the retrieved content to answer the user.
Operation Demonstration
1. Environment Preparation
# Install and run DeepSeek model with Ollama
ollama run deepseek-r1:7bOptional: set GPU layers, CPU threads, batch size, and context size for performance.
2. StarRocks Setup
ADMIN SET FRONTEND CONFIG ("enable_experimental_vector" = "true");
create database knowledge_base;
CREATE TABLE enterprise_knowledge (
id BIGINT AUTO_INCREMENT,
content TEXT NOT NULL,
embedding ARRAY<FLOAT> NOT NULL,
INDEX vec_idx (embedding) USING VECTOR (
"index_type" = "hnsw",
"dim" = "3584",
"metric_type" = "l2_distance",
"M" = "16",
"efconstruction" = "40"
)
) ENGINE=OLAP PRIMARY KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1");3. Text‑to‑Vector Conversion
curl -X POST http://localhost:11434/api/embeddings -d '{"model": "deepseek-r1:7b", "prompt": "产品保修期是一年。"}'Store the resulting 3584‑dimensional embedding in StarRocks.
4. Python Example: Store Embedding
import pymysql, requests
def get_embedding(text):
url = "http://localhost:11434/api/embeddings"
payload = {"model": "deepseek-r1:7b", "prompt": text}
response = requests.post(url, json=payload)
response.raise_for_status()
return response.json()["embedding"]
content = "StarRocks 的愿景是让用户的数据分析更简单和敏捷。"
embedding = get_embedding(content)
embedding_str = "[" + ",".join(map(str, embedding)) + "]"
conn = pymysql.connect(host='X.X.X.X', port=9030, user='root', password='sr123456', database='knowledge_base')
cursor = conn.cursor()
sql = "INSERT INTO enterprise_knowledge (content, embedding) VALUES (%s, %s)"
cursor.execute(sql, (content, embedding_str))
conn.commit()
print(f"Inserted: {content} with embedding {embedding[:5]}...")5. Retrieval and RAG Pipeline
def search_knowledge_base(query_embedding):
conn = pymysql.connect(host='X.X.X.X', port=9030, user='root', password='sr123456', database='knowledge_base')
cursor = conn.cursor()
embedding_str = "[" + ",".join(map(str, query_embedding)) + "]"
sql = """
SELECT content, l2_distance(embedding, %s) AS distance
FROM enterprise_knowledge
ORDER BY distance ASC
LIMIT 3
"""
cursor.execute(sql, (embedding_str,))
results = cursor.fetchall()
return "".join([r[0] for r in results])
def build_rag_prompt(query, retrieved_content):
return f"""
[系统指令] 你是企业智能客服,基于以下知识回答用户问题:
[知识上下文] {retrieved_content}
[用户问题] {query}
"""
def generate_answer(prompt):
url = "http://localhost:11434/api/generate"
payload = {"model": "deepseek-r1:7b", "prompt": prompt}
response = requests.post(url, json=payload)
response.raise_for_status()
full = ""
for line in response.text.splitlines():
if line.strip():
try:
obj = json.loads(line)
if "response" in obj:
full += obj["response"]
if obj.get("done"):
break
except json.JSONDecodeError:
continue
return re.sub(r"<think>.*?</think>", "", full.strip(), flags=re.DOTALL)
def rag_pipeline(user_id, query):
query_emb = get_embedding(query)
retrieved = search_knowledge_base(query_emb)
prompt = build_rag_prompt(query, retrieved)
answer = generate_answer(prompt)
# log to customer_service_log (omitted for brevity)
return answer6. Flask API and Simple Web UI
<!DOCTYPE html>
<html lang="zh">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>智能问答客服系统</title>
<script>
async function askQuestion() {
let question = document.getElementById("question").value;
let response = await fetch("/ask", {method: "POST", headers: {"Content-Type": "application/json"}, body: JSON.stringify({question})});
let data = await response.json();
document.getElementById("answer").innerText = data.answer;
}
</script>
</head>
<body>
<h1>智能问答客服系统</h1>
<input type="text" id="question" placeholder="请输入您的问题">
<button onclick="askQuestion()">提问</button>
<p id="answer"></p>
</body>
</html> from flask import Flask, request, jsonify, render_template
import logging, json, re
app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
# (functions get_embedding, search_knowledge_base, build_rag_prompt, generate_answer, rag_pipeline defined above)
@app.route('/')
def index():
return render_template('index.html')
@app.route('/ask', methods=['POST'])
def ask():
user_id = "sr_01"
question = request.json.get('question', '')
answer = rag_pipeline(user_id, question)
return jsonify({'answer': f"问题:{question}
回答:{answer}"})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=9033, debug=True)The article concludes with a summary of the RAG‑enhanced execution flow and invites community contributions to the StarRocks AI co‑creation plan.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
