Mastering Advanced Retrieval: Fusion and Recursive Strategies for RAG
This article explores two advanced retrieval paradigms—Fusion Retrieval, which merges results from multiple retrievers using re‑ranking, and Recursive Retrieval, which builds hierarchical chunk‑to‑chunk or chunk‑to‑retriever links—to boost the quality and flexibility of Retrieval‑Augmented Generation pipelines.
Fusion Retrieval
Fusion Retrieval combines multiple retrieval methods to compensate for the limitations of a single vector index. It generates several query variants (e.g., via query rewriting or different retrieval algorithms) and runs each against one or more retrievers. The collected chunks are then re‑ranked—commonly with Reciprocal Rank Fusion (RRF)—to produce a final ordered list for downstream generation.
Typical ways to realize fusion include:
Rewriting the original question into multiple formulations and retrieving each separately.
Using heterogeneous index types (vector + keyword, knowledge‑graph, summary indexes) in parallel.
Combining different scoring algorithms on the same index.
Mixing the above approaches into a composite pipeline.
Key components in frameworks such as LangChain or LlamaIndex are QueryTransform (or other rewriter), Retriever , and Reranker . A custom fusion retriever can be built by wiring these pieces together.
class FusionRetriever(BaseRetriever):
# Build a fusion retriever from multiple base retrievers
def __init__(self, retrievers: List[BaseRetriever], similarity_top_k: int = 3):
self._retrievers = retrievers
self._similarity_top_k = similarity_top_k
super().__init__()
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
# 1. Rewrite the query into several variants
queries = rewrite_query(query_bundle.query_str, num=3)
# 2. Run each query against all retrievers
results_dict = asyncio.run(run_queries(queries, self._retrievers))
# 3. Re‑rank using RRF (or another reranker)
final_results = rerank_results(results_dict, similarity_top_k=self._similarity_top_k)
return final_resultsRecursive Retrieval
Recursive Retrieval constructs a hierarchy of chunks and retrievers (or RAG engines/Agents). A top‑level chunk links to a lower‑level chunk or a dedicated retriever, which is then invoked to fetch deeper knowledge. This process repeats until a termination condition (e.g., depth limit or sufficient relevance) is met.
Typical link types:
Chunk → another chunk (parent‑child relationship).
Chunk → a second‑level retriever.
Chunk → a RAG engine that returns an answer.
Chunk → an Agent capable of tool use and complex reasoning.
Use cases include:
Providing richer context by linking a short chunk to a larger parent chunk.
Answering summary‑style queries via a summary chunk that points to detailed chunks.
Handling hypothetical questions by linking to explanatory chunks.
Multi‑document QA: first retrieve relevant summary chunks, then recursively retrieve the underlying source chunks.
Complex table queries: create a dedicated RAG engine for extracted tables (e.g., using Pandas or SQL) and link table summary chunks to it.
When a chunk links to a retriever, the system performs a second‑level search on the linked index (e.g., a summary index) and feeds those results into the final generation step. When linking to a RAG engine or Agent, the downstream answer from that component becomes part of the generation context, offering richer tool‑driven capabilities.
Conclusion
Relying solely on a single vector‑based semantic search is insufficient for many production RAG scenarios. Understanding and applying advanced retrieval patterns such as Fusion Retrieval and Recursive Retrieval—supported by frameworks like LangChain and LlamaIndex—enables more accurate, flexible, and production‑ready AI applications.
Knowledge‑graph based retrieval
Keyword‑table retrieval
Vector‑BM25 hybrid
Multi‑retriever semantic routing
Automatic metadata filtering
Dynamic chunk‑size merging
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Large Model Application Practice
Focused on deep research and development of large-model applications. Authors of "RAG Application Development and Optimization Based on Large Models" and "MCP Principles Unveiled and Development Guide". Primarily B2B, with B2C as a supplement.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
