Inside RAGFlow: How Its Microservice Architecture Powers an Enterprise‑Grade Retrieval‑Augmented Generation Platform

This article provides a detailed technical walkthrough of RAGFlow's architecture, covering its microservice design, directory layout, layered structure, cloud‑native deployment, core modules such as DeepDoc, RAG engine, Agent system, and web UI, as well as multi‑tenant isolation, streaming responses, asynchronous task handling, concurrency controls, scalability strategies, and a complete request‑lifecycle example for document upload.

Tech Freedom Circle
Tech Freedom Circle
Tech Freedom Circle
Inside RAGFlow: How Its Microservice Architecture Powers an Enterprise‑Grade Retrieval‑Augmented Generation Platform

RAGFlow Architecture Overview

RAGFlow is an intelligent document library that stores, understands, and answers queries over millions of documents. The system follows a microservice architecture that isolates responsibilities while enabling coordinated operation.

Microservice Design Rationale

deepdoc/

– deep document understanding, analogous to an imaging department. rag/ – retrieval‑augmented generation, analogous to an internal‑medicine diagnosis unit. agent/ – intelligent agent workflow, analogous to a general practitioner coordinating specialists.

Project Directory Structure

ragflow/
├── api/                 # API gateway and front‑end services (reception hall)
│   ├── apps/            # Business‑level API endpoints
│   │   ├── conversation_app.py   # Dialogue management
│   │   ├── dataset_app.py        # Knowledge‑base management
│   │   ├── document_app.py       # Document management
│   │   ├── file_app.py           # File upload
│   │   ├── llm_app.py            # Large‑model configuration
│   │   └── user_app.py           # User management
│   ├── db/               # Database layer
│   ├── ragflow_server.py # Main server entry point
│   └── settings.py      # Service configuration
├── rag/                 # Core RAG engine
│   ├── app/             # Business logic (retrieval, generation, QA)
│   ├── flow/           # Workflow management
│   ├── llm/            # Model adapters (chat, embedding, rerank)
│   ├── nlp/            # NLP utilities (tokenizer, search)
│   └── utils/          # Helper functions
├── deepdoc/             # Deep document understanding engine
│   ├── parser/          # PDF, Word, Excel, PPT parsers
│   └── vision/          # Layout recognition, OCR, table extraction
├── agent/               # Agent system (workflow nodes, tools, templates)
├── web/                 # Front‑end UI (React 18 + UmiJS)
├── docker/              # Containerization (Dockerfiles, compose files)
├── conf/                # Configuration files
├── sandbox/             # Code execution sandbox
├── test/                # Test suite
├── docs/                # Documentation
├── pyproject.toml       # Python project configuration
├── Dockerfile           # Image build definition
└── README.md            # Project overview

Layered Architecture

The system is organized like a well‑designed office building, with each layer serving a distinct purpose.

Layered architecture diagram
Layered architecture diagram

Cloud‑Native Design

All services are packaged as Docker images, like standardized containers.

Configuration is externalized via docker/.env and conf/service_conf.yaml, allowing the same image to run in development, testing, or production.

Each service exposes health‑check endpoints for automatic monitoring and restart.

Core Modules

DeepDoc Engine

The DeepDoc engine extracts structure and content from various file formats.

PDF parser – OCR, layout analysis, table extraction.

Word parser – Style preservation, embedded object handling.

Excel parser – Table structure parsing, data type inference.

PPT parser – Slide content extraction, image‑text combination.

Chunking strategies :

Semantic chunking – splits documents by logical meaning to keep content coherent.

Structural chunking – uses headings, paragraphs, tables as boundaries.

Dynamic chunking – adjusts chunk size based on content complexity.

RAG Retrieval Engine

The retrieval engine combines traditional keyword search with semantic vector matching.

(1) Keyword full‑text search – BM25 algorithm for fast term matching.

(2) Semantic vector retrieval – embedding‑based similarity scoring.

(3) Hybrid retrieval – merges both methods to leverage strengths of each.

(4) Re‑ranking – a dedicated model refines the final result order.

Retrieval strategy diagram
Retrieval strategy diagram

Agent System

The agent system provides a programmable workflow that can call external tools and orchestrate complex logic.

Start node – entry point, receives user input (used by all workflows).

Retrieval node – fetches relevant knowledge‑base information (used for Q&A).

Generate node – calls LLM to produce content (used for text generation).

Tool node – invokes external APIs (Google, Bing, Wikipedia, arXiv, etc.) for information lookup.

Condition node – branching based on logical conditions for complex decision making.

Search engines: Google, Bing, Baidu.

Academic resources: arXiv, scholarly search.

Knowledge bases: Wikipedia, Baidu Baike.

Computation tools: code execution, mathematical calculations.

Web Front‑End

Framework: React 18 + UmiJS 4

State management: Redux Toolkit + React Query

UI components: Ant Design 5.x

Styling: Tailwind CSS

Type safety: TypeScript

Technical Highlights

Multi‑Tenant Architecture

Database level: each tenant has a distinct tenant_id column.

Search index level: separate Elasticsearch index per tenant.

Object storage level: dedicated MinIO bucket per tenant.

Cache level: tenant ID used as Redis key prefix.

Streaming Response Experience

Server‑Sent Events (SSE) keep a long‑lived connection for real‑time token streaming.

Chunked transfer reduces perceived latency by sending partial LLM outputs.

Front‑end buffering merges chunks for smooth display.

Asynchronous Task Processing

Heavy operations are decoupled via a message queue, allowing the API layer to respond instantly while workers handle the work.

# Example: asynchronous document processing task
class DocumentProcessor:
    def process_document_async(self, document_id: str):
        """Process the full lifecycle of a document asynchronously"""
        # 1. Parse document
        parse_task = self.create_task('parse_document', {
            'document_id': document_id,
            'priority': 'high'
        })
        # 2. Chunking (depends on parsing)
        chunk_task = self.create_task('chunk_document', {
            'document_id': document_id,
            'depends_on': parse_task.id
        })
        # 3. Embedding (depends on chunking)
        embed_task = self.create_task('embed_chunks', {
            'document_id': document_id,
            'depends_on': chunk_task.id
        })
        # 4. Index building (depends on embedding)
        index_task = self.create_task('build_index', {
            'document_id': document_id,
            'depends_on': embed_task.id
        })
        return [parse_task, chunk_task, embed_task, index_task]

Smart Concurrency Control

API rate limiting: max 100 calls per user per minute.

Parsing concurrency: at most 10 document‑parsing tasks run simultaneously.

Resource circuit‑breaker: automatic fallback and retry when large‑model calls fail.

Scalability Design

Stateless services enable horizontal scaling by adding more instances.

Load balancing via Nginx or Kubernetes distributes traffic evenly.

Data sharding support for both Elasticsearch and MySQL to handle growth.

Plugin‑Based Extensibility

Document parsers can be extended to support new formats.

LLM adapters allow integration of various model providers.

Agent tools can be plugged in to add new capabilities.

Request Lifecycle – Document Upload Example

The upload flow demonstrates the decoupled synchronous and asynchronous stages.

Upload request flow diagram
Upload request flow diagram

Synchronous phase (user‑visible) : ragflow‑server stores the file in MinIO, writes a metadata record to MySQL, and pushes a task to RabbitMQ. The user receives an immediate "upload successful" response.

Asynchronous phase (background) : ragflow‑worker consumes the RabbitMQ message and performs parsing, chunking, embedding, and indexing.

Decoupling advantage : The API remains responsive under massive concurrent uploads; scaling the worker pool horizontally increases processing throughput.

Modular Code Structure

api/

– synchronous entry points and RESTful APIs. rag/ – core RAG logic, asynchronous pipelines, NLP utilities. agent/ – agent workflow engine. utils/ – wrappers for external services (Elasticsearch, Redis, MinIO, RabbitMQ).

Key Takeaways

RAGFlow demonstrates how a well‑engineered microservice architecture, combined with AI‑driven components, delivers a high‑performance, highly available enterprise RAG platform. Professional specialization, clear interfaces, cloud‑native deployment, streaming interaction, and robust scalability together form the recipe for a modern intelligent knowledge system.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

microservicesElasticsearchRedisRetrieval-Augmented GenerationAI ArchitectureDocker ComposeRAGFlowDeepDoc
Tech Freedom Circle
Written by

Tech Freedom Circle

Crazy Maker Circle (Tech Freedom Architecture Circle): a community of tech enthusiasts, experts, and high‑performance fans. Many top‑level masters, architects, and hobbyists have achieved tech freedom; another wave of go‑getters are hustling hard toward tech freedom.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.