Solving Knowledge Challenges in Retrieval‑Augmented Generation: Practical Optimizations
This article shares a half‑year of hands‑on experience with Retrieval‑Augmented Generation, analyzing why simple RAG setups often feel unintelligent, identifying three core knowledge issues, and presenting concrete optimization strategies—including chunking, knowledge expansion, and tag‑based conflict resolution—to improve retrieval and generation performance in low‑resource environments.
Based on six months of practical work with Retrieval‑Augmented Generation (RAG), the author reflects on the technology's potential, the challenges faced in real‑world deployments, and the step‑by‑step optimizations that helped achieve stable results.
RAG Background
Retrieval‑augmented generation (RAG) combines a pre‑retrieved knowledge base with a large language model (LLM) to enhance answer quality. Simple implementations retrieve documents, then let the LLM generate responses, but this often yields low‑quality or hallucinated answers when the knowledge base is noisy or poorly chunked.
Analyzing Simple RAG Knowledge Challenges
Simple RAG pipelines quickly assemble a system with query rewriting, vector embedding, multi‑retrieval, and LLM summarization, but they often feel like an "intelligent shell" because the LLM lacks deep understanding of the injected knowledge, leading to hallucinations.
Complex RAG Knowledge Optimization Practices
The article addresses three knowledge problems: contradictory chunking, missing knowledge, and conflicting knowledge, and proposes optimization ideas for each.
Knowledge Retrieval Optimization from Chunking Contradictions
Chunking Optimization
Documents (mainly internal Markdown) are first split by structural headings, then long chunks are further split by length, and finally small chunks are merged hierarchically to create balanced Blocks that retain heading information while providing appropriate context size.
Decoupling Chunk Granularity for Retrieval and Generation
Three granularity levels are used:
Sentence : fine‑grained chunks for precise retrieval.
Segment : medium‑sized chunks preserving structural context for re‑ranking.
Block : larger, context‑rich chunks for LLM generation.
Knowledge Enhancement to Address Missing Knowledge
Granularity Expansion – General Knowledge Enhancement
Two approaches are explored: expanding the query to document granularity (HyDE‑style) and condensing documents to query granularity via summarization. The latter is implemented offline using LLM‑assisted extraction and summarization.
Prior Knowledge Expansion – Domain Knowledge Extraction
Domain‑specific knowledge is injected explicitly by building a Memo Model that extracts entities and relationships from the knowledge base, clusters them, and generates community reports. During inference, the query is matched against this Memo Model and the retrieved domain facts are fed to the LLM.
Knowledge Source Expansion – Experiential Knowledge Accumulation
Historical support tickets are mined to extract QA pairs. A two‑stage prompting (global and local) extracts main issues and sub‑issues, producing a dataset of ~1300 high‑quality QA pairs used to fine‑tune a Qwen‑14B model. Low‑quality pairs are filtered by the LLM.
Knowledge Tagging Strategies for Conflicting Knowledge
From Knowledge Tags to Document Priority
Document metadata such as timestamps, view counts, and manual tags are used to assign priority scores, helping filter irrelevant documents and guide the LLM toward higher‑quality sources.
Applying Document Priority
Priority is used primarily for filtering irrelevant documents and, during the LLM's reference selection phase, is communicated via prompts so the model can reason about which documents to trust.
Future Plans
Further work includes deeper mining of historical conversations, multimodal knowledge expansion (incorporating images), and continued refinement of the optimization pipeline.
References
https://arxiv.org/pdf/2312.10997
https://zilliz.com/learn/improve-rag-and-information-retrieval-with-hyde-hypothetical-document-embeddings
https://arxiv.org/pdf/2404.16130
https://arxiv.org/pdf/2409.05591
Step‑by‑Step Guide to Building RAG Applications
This solution leverages AnalyticDB for PostgreSQL’s vector engine and Alibaba Cloud’s Tongyi Qianwen LLM to create a high‑performance RAG system for enterprise AI customer service.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
