Introduction to Retrieval‑Augmented Generation (RAG) and Vector Indexing with StarRocks and DeepSeek

This article explains the fundamentals of Retrieval‑Augmented Generation, demonstrates how to create and query vector indexes using StarRocks, shows how DeepSeek provides embeddings and answer generation, and walks through a complete end‑to‑end RAG pipeline with code examples and a web UI.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Introduction to Retrieval‑Augmented Generation (RAG) and Vector Indexing with StarRocks and DeepSeek

RAG (Retrieval‑Augmented Generation) combines external knowledge retrieval with AI generation to overcome the static knowledge limitation of large language models, delivering more accurate and up‑to‑date answers.

Core RAG Process

Retrieval : After a user asks a question, relevant content is fetched from external sources (e.g., Wikipedia, enterprise documents) using vector databases, search engines, or traditional databases.

Generation : The retrieved information is fed together with the user query into a generation model (e.g., GPT, LLaMA, DeepSeek) to produce a response that is grounded in real data.

The standard workflow includes preprocessing data (images, documents, audio, video) into embeddings, storing them in a vector database, and using ANN algorithms (HNSW, IVF) for efficient similarity search.

Typical RAG Applications with StarRocks + DeepSeek

Enterprise knowledge bases (document search, FAQ)

Domain‑specific Q&A for legal, finance, medical fields

Code search and software documentation queries

Intelligent customer service for banking, e‑commerce, etc.

For each scenario, the solution consists of three steps: (1) document embedding with DeepSeek, (2) storage and indexing in StarRocks (using HNSW or IVFPQ), and (3) RAG‑enhanced generation where DeepSeek consumes the retrieved content to answer the user.

Operation Demonstration

1. Environment Preparation

# Install and run DeepSeek model with Ollama
ollama run deepseek-r1:7b

Optional: set GPU layers, CPU threads, batch size, and context size for performance.

2. StarRocks Setup

ADMIN SET FRONTEND CONFIG ("enable_experimental_vector" = "true");
create database knowledge_base;
CREATE TABLE enterprise_knowledge (
    id BIGINT AUTO_INCREMENT,
    content TEXT NOT NULL,
    embedding ARRAY<FLOAT> NOT NULL,
    INDEX vec_idx (embedding) USING VECTOR (
        "index_type" = "hnsw",
        "dim" = "3584",
        "metric_type" = "l2_distance",
        "M" = "16",
        "efconstruction" = "40"
    )
) ENGINE=OLAP PRIMARY KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES ("replication_num" = "1");

3. Text‑to‑Vector Conversion

curl -X POST http://localhost:11434/api/embeddings -d '{"model": "deepseek-r1:7b", "prompt": "产品保修期是一年。"}'

Store the resulting 3584‑dimensional embedding in StarRocks.

4. Python Example: Store Embedding

import pymysql, requests

def get_embedding(text):
    url = "http://localhost:11434/api/embeddings"
    payload = {"model": "deepseek-r1:7b", "prompt": text}
    response = requests.post(url, json=payload)
    response.raise_for_status()
    return response.json()["embedding"]

content = "StarRocks 的愿景是让用户的数据分析更简单和敏捷。"
embedding = get_embedding(content)
embedding_str = "[" + ",".join(map(str, embedding)) + "]"
conn = pymysql.connect(host='X.X.X.X', port=9030, user='root', password='sr123456', database='knowledge_base')
cursor = conn.cursor()
sql = "INSERT INTO enterprise_knowledge (content, embedding) VALUES (%s, %s)"
cursor.execute(sql, (content, embedding_str))
conn.commit()
print(f"Inserted: {content} with embedding {embedding[:5]}...")

5. Retrieval and RAG Pipeline

def search_knowledge_base(query_embedding):
    conn = pymysql.connect(host='X.X.X.X', port=9030, user='root', password='sr123456', database='knowledge_base')
    cursor = conn.cursor()
    embedding_str = "[" + ",".join(map(str, query_embedding)) + "]"
    sql = """
        SELECT content, l2_distance(embedding, %s) AS distance
        FROM enterprise_knowledge
        ORDER BY distance ASC
        LIMIT 3
    """
    cursor.execute(sql, (embedding_str,))
    results = cursor.fetchall()
    return "".join([r[0] for r in results])

def build_rag_prompt(query, retrieved_content):
    return f"""
    [系统指令] 你是企业智能客服,基于以下知识回答用户问题:
    [知识上下文] {retrieved_content}
    [用户问题] {query}
    """

def generate_answer(prompt):
    url = "http://localhost:11434/api/generate"
    payload = {"model": "deepseek-r1:7b", "prompt": prompt}
    response = requests.post(url, json=payload)
    response.raise_for_status()
    full = ""
    for line in response.text.splitlines():
        if line.strip():
            try:
                obj = json.loads(line)
                if "response" in obj:
                    full += obj["response"]
                if obj.get("done"):
                    break
            except json.JSONDecodeError:
                continue
    return re.sub(r"<think>.*?</think>", "", full.strip(), flags=re.DOTALL)

def rag_pipeline(user_id, query):
    query_emb = get_embedding(query)
    retrieved = search_knowledge_base(query_emb)
    prompt = build_rag_prompt(query, retrieved)
    answer = generate_answer(prompt)
    # log to customer_service_log (omitted for brevity)
    return answer

6. Flask API and Simple Web UI

<!DOCTYPE html>
<html lang="zh">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>智能问答客服系统</title>
  <script>
    async function askQuestion() {
      let question = document.getElementById("question").value;
      let response = await fetch("/ask", {method: "POST", headers: {"Content-Type": "application/json"}, body: JSON.stringify({question})});
      let data = await response.json();
      document.getElementById("answer").innerText = data.answer;
    }
  </script>
</head>
<body>
  <h1>智能问答客服系统</h1>
  <input type="text" id="question" placeholder="请输入您的问题">
  <button onclick="askQuestion()">提问</button>
  <p id="answer"></p>
</body>
</html>
from flask import Flask, request, jsonify, render_template
import logging, json, re
app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
# (functions get_embedding, search_knowledge_base, build_rag_prompt, generate_answer, rag_pipeline defined above)
@app.route('/')
def index():
    return render_template('index.html')
@app.route('/ask', methods=['POST'])
def ask():
    user_id = "sr_01"
    question = request.json.get('question', '')
    answer = rag_pipeline(user_id, question)
    return jsonify({'answer': f"问题:{question}
回答:{answer}"})
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=9033, debug=True)

The article concludes with a summary of the RAG‑enhanced execution flow and invites community contributions to the StarRocks AI co‑creation plan.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonAIRAGStarRocksembeddingDeepSeekvector indexing
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.