Build an Enterprise RAG Vector Search System from Scratch with LangChain, Easysearch, and MiMo
This article walks through the complete end‑to‑end pipeline for building a production‑grade RAG system—including document chunking, embedding generation via MiMo, vector storage and kNN retrieval in Easysearch, hybrid search configuration, prompt engineering, answer generation, interactive chat, and a detailed list of common pitfalls and fixes.
Background
Enterprise knowledge‑base Q&A requires three decisions: where to store vectors, how to retrieve them, and which LLM generates answers. The chosen stack combines Easysearch (vector store with Elasticsearch‑compatible API and kNN plugin), MiMo (OpenAI‑compatible chat model that can be prompted for embeddings), and a lightweight Python orchestration layer that calls the REST APIs directly.
Architecture
Stage 1 – Offline indexing
Original document (easysearch_docs.txt)
→ RecursiveCharacterTextSplitter (500 chars per chunk, 50‑char overlap)
→ MiMo LLM semantic encoding (256‑dim vector per chunk)
→ Easysearch kNN index (knn_dense_float_vector)Stage 2 – Online retrieval & generation
User query
→ MiMo LLM vectorisation (same model, same embedding space)
→ Easysearch knn_nearest_neighbors (Top‑K retrieval)
→ Concatenate retrieved passages + user question
→ MiMo LLM generates answerAll vectors share the same semantic space; mismatches break retrieval quality.
Step 1 – Environment preparation
Prerequisites
Running Easysearch cluster with HTTPS enabled and kNN plugin loaded.
MiMo API key (Base URL https://token-plan-cn.xiaomimimo.com/v1).
Python ≥ 3.10.
Project layout
vectorPrj/
├── .env # API keys, connection params
├── config.py # config loader
├── es_client.py # Easysearch REST wrapper (requests)
├── mimo_embeddings.py # MiMo embedding wrapper
├── indexing.py # document chunking & vector write‑back
├── retriever.py # kNN vector retrieval
├── rag_qa.py # RAG chain
├── search_test.py # retrieval verification
├── main.py # CLI entry point
├── easysearch_docs.txt # knowledge‑base documents
└── requirements.txt # dependenciesInstall dependencies
pip install -r requirements.txtCore packages: requests, python-dotenv, langchain, langchain‑community.
Step 2 – Generate embeddings with MiMo
Key insight
MiMo v2.5‑pro is an inference model without a dedicated embedding endpoint. Embeddings are obtained by prompting the Chat API to return a JSON array of 256‑dimensional floats.
Implementation
Text → Prompt: "You are a text semantic encoder. Map the following text to a 256‑dimensional vector. Output JSON only."
→ MiMo Chat API
→ Parse JSON → vectorCore code (mimo_embeddings.py)
class MiMoLLMEmbeddings(Embeddings):
"""Generate semantic vectors using MiMo Chat API"""
def _batch_to_vectors(self, texts: List[str]) -> List[List[float]]:
prompt = (
f"You are a text semantic encoder. Please map the following {len(texts)} pieces of text to "
f"{self.dims}-dimensional vectors. Values must be between -1.0 and 1.0.
"
"Only output JSON, no explanations.
"
"Format: {\"vectors\": [[...], [...], ...]}
"
)
msg = self._call_chat(prompt)
content = msg.get("content", "") or msg.get("reasoning_content", "")
return self._extract_vectors(content, len(texts), self.dims)Gotchas & tuning
max_tokens too low : 256‑dim vectors need ~3000 tokens. max_tokens=4096 caused truncation to zeros; max_tokens=8192 stabilises output.
reasoning_content vs. content : MiMo returns both fields; either may contain the vector JSON.
Batch size : BATCH_SIZE=3 sometimes produced mis‑aligned vectors. Setting BATCH_SIZE=1 (process one chunk at a time) is slower but reliable.
Vector parsing strategies
# Strategy 1: parse JSON {"vectors": [[...], ...]}
# Strategy 2: regex‑extract numeric arrays [0.1, -0.2, ...]
# Pad or truncate to required dimensionAlternative backend
If MiMo is unavailable, set EMBEDDING_BACKEND=bge in .env to switch to the local BAAI/bge‑base‑zh‑v1.5 model without code changes.
Step 3 – Write documents to Easysearch vector index
Why not use LangChain ElasticsearchStore?
The official Elasticsearch Python client validates that the server is a genuine Elasticsearch node and rejects Easysearch, which is ES‑compatible but not Elasticsearch. Direct requests calls avoid this validation.
Custom HTTP wrapper (es_client.py)
def es_request(method, path, body=None):
url = f"{Config.ES_HOST}/{path}"
resp = session.request(method, url, data=json.dumps(body))
return resp.json()Create kNN index
mapping = {
"mappings": {
"properties": {
"content": {"type": "text"},
"source": {"type": "keyword"},
"content_vector": {
"type": "knn_dense_float_vector",
"knn": {"dims": 256}
}
}
}
}
es_request("PUT", "rag-easysearch-docs", body=mapping)Key parameters: knn_dense_float_vector – required vector field type for the Easysearch kNN plugin. dims=256 – must match MiMo output; mismatched dimensions raise indexing errors. content (text) and source (keyword) enable hybrid BM25 retrieval.
Document chunking & bulk write
def index_documents():
docs = TextLoader("easysearch_docs.txt").load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50,
separators=["
", "
", "。", "!", "?", ",", " ", ""]
)
chunks = splitter.split_documents(docs)
embeddings = MiMoLLMEmbeddings()
vectors = embeddings.embed_documents([c.page_content for c in chunks])
actions = []
for chunk, vec in zip(chunks, vectors):
actions.append({
"_index": "rag-easysearch-docs",
"_source": {
"content": chunk.page_content,
"source": chunk.metadata["source"],
"content_vector": vec
}
})
es_bulk(actions)Run indexing:
python main.py indexStep 4 – Verify vector retrieval
kNN query syntax (Easysearch native)
body = {
"query": {
"knn_nearest_neighbors": {
"field": "content_vector",
"vec": {"values": query_vector},
"model": "exact", # exact scan, best for <100k vectors
"similarity": "cosine",
"candidates": 100
}
}
}Model parameter options
exact– precise computation, scans all vectors; suitable when data < 100 k and accuracy is priority. lsh – locality‑sensitive hashing, approximate search; suitable for large data when speed is priority (requires a compatible mapping).
Hybrid search (vector + BM25)
{
"bool": {
"should": [
{"knn_nearest_neighbors": {...}}, // vector similarity (score boost)
{"match": {"content": query}} // BM25 keyword match (score boost)
]
}
}Toggle hybrid mode with HYBRID_SEARCH=true/false in .env.
Verification command
python main.py search "security features"Expected output shows the top‑3 documents with relevance scores. If the top results are unrelated, the embedding model or language mismatch is likely the cause.
Step 5 – Build the RAG QA chain
Full pipeline
User question → MiMo vectorisation → Easysearch kNN Top‑K → Prompt concatenation → MiMo LLM generates answerCustom prompt (to avoid hallucination)
PROMPT = """You are an Easysearch technical expert. Answer the question strictly based on the following document content. If the document does not contain the answer, say \"No relevant information found in the documents\". Do not fabricate.
Document content:
{context}
User question: {question}
Answer:"""QA execution
python main.py ask "What security features does Easysearch provide?"Sample answer includes a bullet list of features and the source documents with relevance scores.
Step 6 – Interactive chat
python main.py chatAfter launch, each user turn performs automatic retrieval and generation, demonstrating the end‑to‑end RAG dialogue.
Common pitfalls (four most frequent)
Pitfall 01 – kNN plugin installed but not loaded
Symptom : Index creation fails with "No handler for type knn_dense_float_vector".
Cause : Plugin appears in easysearch‑plugin list but not in _nodes/plugins.
Fix : Restart Easysearch; the plugin becomes active only after a restart.
Pitfall 02 – Vector dimension mismatch
Symptom : Mapping error during indexing.
Cause : MiMo outputs 256‑dim vectors while the index mapping defines a different dims value.
Fix : Ensure VECTOR_DIMS in config.py matches the actual embedding dimension.
Pitfall 03 – lsh model incompatible with mapping
Symptom : kNN query returns 400 error "query is not compatible with mapping".
Cause : model=lsh requires a special mapping that the default does not provide.
Fix : Switch to model=exact or create an index with the required lsh parameters.
Pitfall 04 – All vectors become zero
Symptom : Retrieval results are unrelated to the query.
Cause : MiMo output truncated because max_tokens was too low, leading to zero‑filled vectors.
Fix : Increase max_tokens=8192, set BATCH_SIZE=1, or parse reasoning_content instead of content.
Production checklist
Easysearch kNN plugin installed and loaded ( _nodes/plugins verified).
MiMo LLM wrapped as a LangChain Embeddings interface.
Documents chunked (500 chars, 50‑char overlap) and indexed in Easysearch.
Retrieval quality verified before connecting the LLM.
Custom prompt enforces document‑based answers.
Hybrid search (vector + BM25) enabled for better proper‑noun recall.
Source documents displayed with each answer for traceability.
Interactive chat mode available for end‑user experience.
Conclusion
The article demonstrates a complete RAG system built from scratch using Easysearch for storage/retrieval, MiMo for embedding and generation, and a minimal Python orchestration layer. Core decisions include bypassing the Elasticsearch client, leveraging the LLM chat API for embeddings, validating retrieval before generation, and adopting hybrid search as the default strategy. Following the provided steps, code, and pitfall mitigations enables rapid deployment of a reliable, enterprise‑grade RAG solution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
