Artificial Intelligence 19 min read

Retrieval‑Augmented Generation (RAG) for Office Applications: Architecture, Challenges, and Practical Practices

This article introduces Retrieval‑Augmented Generation (RAG) as a solution to the hallucination, freshness, and data‑privacy issues of large language models, details its modular architecture, explains the layered system design and hybrid retrieval pipeline, and shares the practical challenges and engineering tricks encountered when deploying RAG in enterprise office scenarios.

DataFunSummit

Oct 21, 2024

Retrieval‑Augmented Generation (RAG) for Office Applications: Architecture, Challenges, and Practical Practices

With the rapid development of large language models, the gap between Chinese models and OpenAI’s offerings is narrowing, yet practical deployment still faces hallucinations, outdated knowledge, and privacy risks. Retrieval‑Augmented Generation (RAG) combines a retrieval engine with a generative model to mitigate these problems by injecting external, up‑to‑date knowledge into the generation process.

The core RAG system consists of several components: data sources, preprocessing modules, a retriever, a ranker, and a generator. A modular architecture separates traditional RAG (indexing → retrieval → generation) from advanced RAG, which adds query rewriting, HyDE‑based document generation, post‑retrieval re‑ranking, and filtering before feeding the final context to the LLM.

In the production‑grade design, three hierarchical layers are employed. The bottom layer hosts algorithmic modules (OCR, multi‑turn query rewriting, tokenization, table recognition). The middle layer implements offline indexing (document parsing, tokenization, vector creation) and online QA (query rewriting, hybrid retrieval, ranking, LLM generation) using vector databases, Elasticsearch, and MySQL. The top layer provides user‑level configuration for knowledge bases, model selection, and dialogue rules.

Key engineering challenges are addressed in three stages: “search more comprehensively,” “rank better,” and “answer more accurately.” Comprehensive search is achieved through OCR‑based document parsing, multi‑turn query rewriting, and hybrid retrieval that fuses dense vector similarity with BM25 keyword matching. Better ranking combines a coarse RRF‑based fusion of multiple recall lists, a ColBERT token‑level ranker, and a cross‑encoder re‑ranker, followed by knowledge‑filtering via an NLI classifier. Accurate answering relies on knowledge formatting, prompt templating, and a two‑stage generation (outline → final answer) using FoRAG to improve factuality and logical consistency.

The final summary emphasizes that building a robust RAG system requires careful attention to each pipeline stage—data ingestion, retrieval, ranking, and generation—rather than relying solely on the LLM, and that a modular, horizontally scalable design enables continuous optimization for enterprise use cases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Prompt engineering RAG Ranking Large Language Model Hybrid Retrieval Retrieval-Augmented Generation

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.