Artificial Intelligence 25 min read

Solving Knowledge Challenges in Retrieval‑Augmented Generation: Practical Optimizations

This article shares a half‑year of hands‑on experience with Retrieval‑Augmented Generation, analyzing why simple RAG setups often feel unintelligent, identifying three core knowledge issues, and presenting concrete optimization strategies—including chunking, knowledge expansion, and tag‑based conflict resolution—to improve retrieval and generation performance in low‑resource environments.

Alibaba Cloud Developer

Nov 18, 2024

Solving Knowledge Challenges in Retrieval‑Augmented Generation: Practical Optimizations

Based on six months of practical work with Retrieval‑Augmented Generation (RAG), the author reflects on the technology's potential, the challenges faced in real‑world deployments, and the step‑by‑step optimizations that helped achieve stable results.

RAG Background

Retrieval‑augmented generation (RAG) combines a pre‑retrieved knowledge base with a large language model (LLM) to enhance answer quality. Simple implementations retrieve documents, then let the LLM generate responses, but this often yields low‑quality or hallucinated answers when the knowledge base is noisy or poorly chunked.

Analyzing Simple RAG Knowledge Challenges

Simple RAG pipelines quickly assemble a system with query rewriting, vector embedding, multi‑retrieval, and LLM summarization, but they often feel like an "intelligent shell" because the LLM lacks deep understanding of the injected knowledge, leading to hallucinations.

Complex RAG Knowledge Optimization Practices

The article addresses three knowledge problems: contradictory chunking, missing knowledge, and conflicting knowledge, and proposes optimization ideas for each.

Knowledge Retrieval Optimization from Chunking Contradictions

Chunking Optimization

Documents (mainly internal Markdown) are first split by structural headings, then long chunks are further split by length, and finally small chunks are merged hierarchically to create balanced Blocks that retain heading information while providing appropriate context size.

Decoupling Chunk Granularity for Retrieval and Generation

Three granularity levels are used:

Sentence : fine‑grained chunks for precise retrieval.

Segment : medium‑sized chunks preserving structural context for re‑ranking.

Block : larger, context‑rich chunks for LLM generation.

Knowledge Enhancement to Address Missing Knowledge

Granularity Expansion – General Knowledge Enhancement

Two approaches are explored: expanding the query to document granularity (HyDE‑style) and condensing documents to query granularity via summarization. The latter is implemented offline using LLM‑assisted extraction and summarization.

Prior Knowledge Expansion – Domain Knowledge Extraction

Domain‑specific knowledge is injected explicitly by building a Memo Model that extracts entities and relationships from the knowledge base, clusters them, and generates community reports. During inference, the query is matched against this Memo Model and the retrieved domain facts are fed to the LLM.

Knowledge Source Expansion – Experiential Knowledge Accumulation

Historical support tickets are mined to extract QA pairs. A two‑stage prompting (global and local) extracts main issues and sub‑issues, producing a dataset of ~1300 high‑quality QA pairs used to fine‑tune a Qwen‑14B model. Low‑quality pairs are filtered by the LLM.

Knowledge Tagging Strategies for Conflicting Knowledge

From Knowledge Tags to Document Priority

Document metadata such as timestamps, view counts, and manual tags are used to assign priority scores, helping filter irrelevant documents and guide the LLM toward higher‑quality sources.

Applying Document Priority

Priority is used primarily for filtering irrelevant documents and, during the LLM's reference selection phase, is communicated via prompts so the model can reason about which documents to trust.

Future Plans

Further work includes deeper mining of historical conversations, multimodal knowledge expansion (incorporating images), and continued refinement of the optimization pipeline.

References

https://arxiv.org/pdf/2312.10997

https://zilliz.com/learn/improve-rag-and-information-retrieval-with-hyde-hypothetical-document-embeddings

https://arxiv.org/pdf/2404.16130

https://arxiv.org/pdf/2409.05591

Step‑by‑Step Guide to Building RAG Applications

This solution leverages AnalyticDB for PostgreSQL’s vector engine and Alibaba Cloud’s Tongyi Qianwen LLM to create a high‑performance RAG system for enterprise AI customer service.

AI large language models RAG Information Retrieval knowledge optimization retrieval-augmented generation

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.