Boost Your RAG Bot’s Accuracy: Hybrid Search, Query Rewriting, and Re‑ranking Explained

This article walks developers through three essential upgrades for Retrieval‑Augmented Generation systems—hybrid search combining vector and keyword retrieval, query rewriting to clarify conversational inputs, and re‑ranking with a cross‑encoder—providing step‑by‑step code examples using LangChain to dramatically improve answer quality.

Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Boost Your RAG Bot’s Accuracy: Hybrid Search, Query Rewriting, and Re‑ranking Explained

Why RAG Often Misses Answers

Many developers wonder why a RAG system fails to return correct answers even when the source documents contain the needed information. The problem usually lies not in the LLM but in the retrieval stage; basic vector search alone is insufficient for complex real‑world scenarios.

Step 1: Hybrid Search (Hybrid Retrieval)

Problem : Pure vector search understands semantics but ignores exact terms, causing missed matches for specific product names or code functions.

Solution : Combine a traditional keyword retriever (e.g., BM25) with a vector retriever to achieve a "1+1>2" effect.

# Import required libraries
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings  # can be replaced with other providers

# Sample documents
docs = [
    "华为云ModelArts是面向AI开发者的平台。",
    "昇思MindSpore是一个全场景AI框架。",
    "ModelArts Pro是企业级AI应用开发套件。"
]

doc_texts = [d for d in docs]

# 1️⃣ Vector retriever (new star)
embeddings = OpenAIEmbeddings()
vector_store = FAISS.from_texts(doc_texts, embeddings)
vector_retriever = vector_store.as_retriever(search_kwargs={"k": 2})

# 2️⃣ Keyword retriever (old veteran)
bm25_retriever = BM25Retriever.from_texts(doc_texts)
bm25_retriever.k = 2

# 3️⃣ Ensemble retriever (dual swords)
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.5, 0.5]
)

# Test the hybrid search
query = "ModelArts平台是做什么的?"
retrieved_docs = ensemble_retriever.invoke(query)
print(retrieved_docs)

Result: BM25 ensures documents containing the exact keyword "ModelArts" are prioritized, while the vector retriever adds semantically related content, delivering a more comprehensive answer.

Step 2: Query Rewriting

Problem : In multi‑turn conversations users often ask vague follow‑up questions like "How does it work?" or refer to previous entities with pronouns, which confuses the retriever.

Solution : Use an LLM as a "translator" to rewrite the user’s latest query into a self‑contained, retrieval‑friendly question before searching.

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

# 1️⃣ Define the rewriting prompt
rewrite_prompt = ChatPromptTemplate.from_messages([
    ("system", "你是一个精通信息检索的助手。请根据对话历史,将用户的最新问题改写成一个独立的、对检索系统更友好的问题。"),
    ("user", "对话历史:
{chat_history}

最新问题: {question}")
])

# 2️⃣ Invoke the LLM
model = ChatOpenAI()
rewriter = rewrite_prompt | model

# 3️⃣ Simulate a conversation
chat_history = "用户: 给我介绍下Text2SQL技术。
AI: Text2SQL能将自然语言转化为SQL查询语句,非常智能。"
question = "它主要用在哪些场景?"

# Rewrite the query
rewritten_question = rewriter.invoke({
    "chat_history": chat_history,
    "question": question
})
print(f"原始问题: {question}")
print(f"改写后问题: {rewritten_question.content}")
# Next step: use rewritten_question.content for retrieval

Result: The ambiguous question is transformed into a clear, standalone query (e.g., "Text2SQL技术主要应用在哪些业务场景?"), enabling the retriever to fetch relevant documents accurately.

Step 3: Re‑ranking

Problem : Hybrid retrieval provides breadth but may return many loosely related documents; the most relevant ones might be ranked low, hurting the final LLM answer.

Solution : Apply a cross‑encoder reranker to rescore and reorder the initial results, pushing the most pertinent documents to the top.

# Cross‑encoder requires a dedicated model; we demonstrate with LangChain's wrapper
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain.retrievers import ContextualCompressionRetriever
from sentence_transformers.cross_encoder import CrossEncoder  # pip install sentence-transformers

# 1️⃣ Initialize the reranker model
cross_encoder_model = CrossEncoder('BAAI/bge-reranker-large')

# 2️⃣ Wrap it for LangChain
compressor = CrossEncoderReranker(model=cross_encoder_model, top_n=3)  # keep top 3 after reranking

# 3️⃣ Build the compression retriever (uses the hybrid retriever from Step 1)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=ensemble_retriever
)

# Test re‑ranking
query = "介绍一下ModelArts Pro套件"
reranked_docs = compression_retriever.invoke(query)
print(reranked_docs)

Result: The reranker pushes the most relevant documents to the front, effectively giving the LLM a "standard answer" and improving the quality of generated responses.

Conclusion

By applying these three upgrades—hybrid search, query rewriting, and re‑ranking—developers can significantly boost the accuracy and reliability of RAG‑based applications. The techniques are modular and can be extended with chunking, advanced query transformation, or other retrieval enhancements.

AILangChainRAGRe‑rankingQuery RewritingHybrid Search
Huawei Cloud Developer Alliance
Written by

Huawei Cloud Developer Alliance

The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.