Inside RAGFlow: How Its Microservice Architecture Powers an Enterprise‑Grade Retrieval‑Augmented Generation Platform
This article provides a detailed technical walkthrough of RAGFlow's architecture, covering its microservice design, directory layout, layered structure, cloud‑native deployment, core modules such as DeepDoc, RAG engine, Agent system, and web UI, as well as multi‑tenant isolation, streaming responses, asynchronous task handling, concurrency controls, scalability strategies, and a complete request‑lifecycle example for document upload.
RAGFlow Architecture Overview
RAGFlow is an intelligent document library that stores, understands, and answers queries over millions of documents. The system follows a microservice architecture that isolates responsibilities while enabling coordinated operation.
Microservice Design Rationale
deepdoc/– deep document understanding, analogous to an imaging department. rag/ – retrieval‑augmented generation, analogous to an internal‑medicine diagnosis unit. agent/ – intelligent agent workflow, analogous to a general practitioner coordinating specialists.
Project Directory Structure
ragflow/
├── api/ # API gateway and front‑end services (reception hall)
│ ├── apps/ # Business‑level API endpoints
│ │ ├── conversation_app.py # Dialogue management
│ │ ├── dataset_app.py # Knowledge‑base management
│ │ ├── document_app.py # Document management
│ │ ├── file_app.py # File upload
│ │ ├── llm_app.py # Large‑model configuration
│ │ └── user_app.py # User management
│ ├── db/ # Database layer
│ ├── ragflow_server.py # Main server entry point
│ └── settings.py # Service configuration
├── rag/ # Core RAG engine
│ ├── app/ # Business logic (retrieval, generation, QA)
│ ├── flow/ # Workflow management
│ ├── llm/ # Model adapters (chat, embedding, rerank)
│ ├── nlp/ # NLP utilities (tokenizer, search)
│ └── utils/ # Helper functions
├── deepdoc/ # Deep document understanding engine
│ ├── parser/ # PDF, Word, Excel, PPT parsers
│ └── vision/ # Layout recognition, OCR, table extraction
├── agent/ # Agent system (workflow nodes, tools, templates)
├── web/ # Front‑end UI (React 18 + UmiJS)
├── docker/ # Containerization (Dockerfiles, compose files)
├── conf/ # Configuration files
├── sandbox/ # Code execution sandbox
├── test/ # Test suite
├── docs/ # Documentation
├── pyproject.toml # Python project configuration
├── Dockerfile # Image build definition
└── README.md # Project overviewLayered Architecture
The system is organized like a well‑designed office building, with each layer serving a distinct purpose.
Cloud‑Native Design
All services are packaged as Docker images, like standardized containers.
Configuration is externalized via docker/.env and conf/service_conf.yaml, allowing the same image to run in development, testing, or production.
Each service exposes health‑check endpoints for automatic monitoring and restart.
Core Modules
DeepDoc Engine
The DeepDoc engine extracts structure and content from various file formats.
PDF parser – OCR, layout analysis, table extraction.
Word parser – Style preservation, embedded object handling.
Excel parser – Table structure parsing, data type inference.
PPT parser – Slide content extraction, image‑text combination.
Chunking strategies :
Semantic chunking – splits documents by logical meaning to keep content coherent.
Structural chunking – uses headings, paragraphs, tables as boundaries.
Dynamic chunking – adjusts chunk size based on content complexity.
RAG Retrieval Engine
The retrieval engine combines traditional keyword search with semantic vector matching.
(1) Keyword full‑text search – BM25 algorithm for fast term matching.
(2) Semantic vector retrieval – embedding‑based similarity scoring.
(3) Hybrid retrieval – merges both methods to leverage strengths of each.
(4) Re‑ranking – a dedicated model refines the final result order.
Agent System
The agent system provides a programmable workflow that can call external tools and orchestrate complex logic.
Start node – entry point, receives user input (used by all workflows).
Retrieval node – fetches relevant knowledge‑base information (used for Q&A).
Generate node – calls LLM to produce content (used for text generation).
Tool node – invokes external APIs (Google, Bing, Wikipedia, arXiv, etc.) for information lookup.
Condition node – branching based on logical conditions for complex decision making.
Search engines: Google, Bing, Baidu.
Academic resources: arXiv, scholarly search.
Knowledge bases: Wikipedia, Baidu Baike.
Computation tools: code execution, mathematical calculations.
Web Front‑End
Framework: React 18 + UmiJS 4
State management: Redux Toolkit + React Query
UI components: Ant Design 5.x
Styling: Tailwind CSS
Type safety: TypeScript
Technical Highlights
Multi‑Tenant Architecture
Database level: each tenant has a distinct tenant_id column.
Search index level: separate Elasticsearch index per tenant.
Object storage level: dedicated MinIO bucket per tenant.
Cache level: tenant ID used as Redis key prefix.
Streaming Response Experience
Server‑Sent Events (SSE) keep a long‑lived connection for real‑time token streaming.
Chunked transfer reduces perceived latency by sending partial LLM outputs.
Front‑end buffering merges chunks for smooth display.
Asynchronous Task Processing
Heavy operations are decoupled via a message queue, allowing the API layer to respond instantly while workers handle the work.
# Example: asynchronous document processing task
class DocumentProcessor:
def process_document_async(self, document_id: str):
"""Process the full lifecycle of a document asynchronously"""
# 1. Parse document
parse_task = self.create_task('parse_document', {
'document_id': document_id,
'priority': 'high'
})
# 2. Chunking (depends on parsing)
chunk_task = self.create_task('chunk_document', {
'document_id': document_id,
'depends_on': parse_task.id
})
# 3. Embedding (depends on chunking)
embed_task = self.create_task('embed_chunks', {
'document_id': document_id,
'depends_on': chunk_task.id
})
# 4. Index building (depends on embedding)
index_task = self.create_task('build_index', {
'document_id': document_id,
'depends_on': embed_task.id
})
return [parse_task, chunk_task, embed_task, index_task]Smart Concurrency Control
API rate limiting: max 100 calls per user per minute.
Parsing concurrency: at most 10 document‑parsing tasks run simultaneously.
Resource circuit‑breaker: automatic fallback and retry when large‑model calls fail.
Scalability Design
Stateless services enable horizontal scaling by adding more instances.
Load balancing via Nginx or Kubernetes distributes traffic evenly.
Data sharding support for both Elasticsearch and MySQL to handle growth.
Plugin‑Based Extensibility
Document parsers can be extended to support new formats.
LLM adapters allow integration of various model providers.
Agent tools can be plugged in to add new capabilities.
Request Lifecycle – Document Upload Example
The upload flow demonstrates the decoupled synchronous and asynchronous stages.
Synchronous phase (user‑visible) : ragflow‑server stores the file in MinIO, writes a metadata record to MySQL, and pushes a task to RabbitMQ. The user receives an immediate "upload successful" response.
Asynchronous phase (background) : ragflow‑worker consumes the RabbitMQ message and performs parsing, chunking, embedding, and indexing.
Decoupling advantage : The API remains responsive under massive concurrent uploads; scaling the worker pool horizontally increases processing throughput.
Modular Code Structure
api/– synchronous entry points and RESTful APIs. rag/ – core RAG logic, asynchronous pipelines, NLP utilities. agent/ – agent workflow engine. utils/ – wrappers for external services (Elasticsearch, Redis, MinIO, RabbitMQ).
Key Takeaways
RAGFlow demonstrates how a well‑engineered microservice architecture, combined with AI‑driven components, delivers a high‑performance, highly available enterprise RAG platform. Professional specialization, clear interfaces, cloud‑native deployment, streaming interaction, and robust scalability together form the recipe for a modern intelligent knowledge system.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tech Freedom Circle
Crazy Maker Circle (Tech Freedom Architecture Circle): a community of tech enthusiasts, experts, and high‑performance fans. Many top‑level masters, architects, and hobbyists have achieved tech freedom; another wave of go‑getters are hustling hard toward tech freedom.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
