Artificial Intelligence 22 min read

How to Supercharge Retrieval‑Augmented Generation: Papers, Techniques, and Real‑World Tips

This article surveys the main challenges of deploying large language models, introduces key RAG optimization papers such as RAPTOR, Self‑RAG, and CRAG, and compiles practical engineering tricks—including chunking, query rewriting, hybrid and progressive retrieval—to help practitioners build more accurate and efficient RAG systems.

Alibaba Cloud Developer

Jun 27, 2024

How to Supercharge Retrieval‑Augmented Generation: Papers, Techniques, and Real‑World Tips

Introduction

When deploying large language models, practitioners encounter three major problems: missing vertical domain knowledge, hallucinations and high entry barriers, and duplicated effort across teams. Retrieval‑augmented generation (RAG) offers a way to inject external knowledge, improve accuracy, and leverage scale.

Key Optimization Papers

RAPTOR

Paper: https://arxiv.org/pdf/2401.18059

RAPTOR (Recursive Abstractive Processing for Tree‑Organized Retrieval) builds a bottom‑up tree structure by recursively embedding short text chunks with SBERT, clustering them with a Gaussian Mixture Model, generating summaries, re‑embedding the summaries, and repeating until no further clustering is possible. This enables multi‑level retrieval across abstraction layers.

Tree construction: split the corpus into short continuous chunks, embed with SBERT, cluster with GMM, summarize each cluster, re‑embed summaries, and iterate to form a multi‑level tree.

Query traversal: start from the root, at each level select the node with highest cosine similarity to the query, and continue until leaf nodes are reached; concatenate selected leaf texts as context.

Experimental results: RAPTOR significantly outperforms traditional RAG on multi‑step reasoning tasks; combined with GPT‑4 it improves QuALITY accuracy by 20%.

Self‑RAG

Paper: https://arxiv.org/pdf/2310.11511

Self‑RAG (Self‑Reflective Retrieval‑Augmented Generation) lets the generator model decide during generation whether to retrieve external documents. It predicts a binary Retrieve flag, then either fetches relevant passages and scores them for relevance (ISREL), support (ISSUP), and usefulness (ISUSE), or directly generates the next paragraph and predicts its usefulness.

If Retrieve = Yes: the retriever fetches documents D, the model scores each document for relevance, support, and usefulness, and ranks them before feeding the top passages to the generator.

If Retrieve = No: the model generates the next segment y_t and predicts its usefulness.

CRAG

Paper: https://arxiv.org/pdf/2401.15884

Corrective Retrieval‑Augmented Generation (CRAG) introduces a lightweight retrieval evaluator that assigns a confidence score to each retrieved document. Based on the confidence (CORRECT, INCORRECT, AMBIGUOUS) the system triggers different actions: internal knowledge refinement, external web search, or a hybrid of both.

Correct: refine internal knowledge.

Incorrect: perform a web search to obtain external knowledge.

Ambiguous: combine internal and external knowledge.

Dense × Retrieval

Paper: https://arxiv.org/pdf/2312.06648

The authors propose using a "proposition"—an atomic fact expressed in a self‑contained sentence—as the retrieval unit. Experiments show that proposition‑level retrieval outperforms paragraph‑ and sentence‑level retrieval on dense‑vector tasks and downstream open‑domain QA.

Practical Optimization Techniques

Chunking: character‑level, recursive character, document‑specific (Markdown, Python, JavaScript, PDF), semantic, and genotype‑based splitting.

Query rewriting: generate multiple sub‑questions, expand queries, correct spelling/grammar, perform semantic reformulation (HyDE, RAG‑Fusion), and add missing context.

Hybrid retrieval: combine sparse (BM25) and dense (vector) retrieval to leverage speed and semantic richness.

Small‑to‑Big: progressive multi‑granularity retrieval that starts from fine‑grained units (sentences) and expands to larger units (documents) until sufficient context is gathered.

Embedding & rerank models: use specialized encoders and rerankers to improve relevance scoring before generation.

Evaluation

Standardized RAG evaluation pipelines enable fair comparison across models and optimizations on datasets such as PopQA, Biography, PubHealth, and ARC‑Challenge. Metrics like Recall@5 and Exact Match@100 quantify retrieval quality and answer correctness.

Conclusion

Deploying effective RAG systems requires coordinated engineering and research effort, extensive experimentation, handling heterogeneous data sources (text, images, tables), multimodal knowledge integration, automated evaluation, and continuous model iteration.

References

RAPTOR: https://arxiv.org/pdf/2401.18059

Self‑RAG: https://arxiv.org/pdf/2310.11511

CRAG: https://arxiv.org/pdf/2401.15884

Dense × Retrieval: https://arxiv.org/pdf/2312.06648

Decompose the "Content" into clear and simple propositions, ensuring they are interpretable out of context.
1. Split compound sentences into simple sentences. Maintain the original phrasing from the input whenever possible.
2. For any named entity that is accompanied by additional descriptive information, separate this information into its own distinct proposition.
3. Decontextualize the proposition by adding necessary modifiers to nouns or entire sentences and replacing pronouns (e.g., "it", "he", "she", "they", "this", "that") with the full name of the entities they refer to.
4. Present the results as a list of strings, formatted in JSON.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

RAG Retrieval-Augmented Generation AI research LLM Optimization

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Introduction

Key Optimization Papers

RAPTOR

Self‑RAG

CRAG

Dense × Retrieval

Practical Optimization Techniques

Evaluation

Conclusion

References

Alibaba Cloud Developer

How this landed with the community

Was this worth your time?

0 Comments

Dense × Retrieval