How Bailei Knowledge Base Uses Flink and DLF (Paimon) to Build an Enterprise‑Scale Full‑Modal RAG System

Bailei Knowledge Base delivers an enterprise‑grade, full‑modal Retrieval‑Augmented Generation solution covering documents, tables, images and audio‑video, powered by Flink's high‑throughput streaming for billions of daily document indexes and DLF/Paimon’s three‑layer reliable backup, achieving sub‑200 ms latency and 99.99% availability.

DataFunSummit
DataFunSummit
DataFunSummit
How Bailei Knowledge Base Uses Flink and DLF (Paimon) to Build an Enterprise‑Scale Full‑Modal RAG System

RAG Background and Challenges

Retrieval‑Augmented Generation (RAG) combines external knowledge retrieval with large language models (LLMs) to mitigate hallucination and outdated information. Traditional workflow‑style RAG performs a single search and feeds raw results to the LLM, which suffers from shallow query understanding, fragmented vector slices, and incomplete context.

Bailei Knowledge Base Core Capabilities

Bailei provides a full‑modal RAG platform supporting four knowledge‑base types: document, table, image, and audio‑video. It offers a unified data connector, multi‑modal parsing, and Model Context Protocol (MCP) tools that can be invoked directly by agents or workflows.

Data Connector

File System : supports 30+ file formats such as PDF, Word, PPT, etc.

Enterprise Document Systems : DingTalk, Yuque, Feishu and other collaboration platforms.

Database Systems : MySQL, PostgreSQL, PolarDB and other relational databases.

The connector provides unified ingestion, tagging, permission management and real‑time incremental sync at second‑level granularity.

Full‑Modal Knowledge‑Base Types

Document Library : text splitting + vector embedding + hybrid recall; typical scenarios include enterprise Q&A and product documentation search.

Table Library : vector recall + Auto SQL (in development); typical scenarios include data query and report analysis.

Image Library : multimodal vectors using Tongyi Qianwen 3 VL Embedding; typical scenarios include image‑to‑text, text‑to‑image, and image‑to‑image search.

Audio‑Video Library : key‑frame extraction + speech recognition + multimodal summarisation; typical scenarios include video content retrieval and audio Q&A.

Index Construction Architecture

The system separates offline and online phases. Offline indexing parses raw data, performs intelligent chunking, extracts multimodal embeddings and writes to the search engine. Online retrieval configures multi‑round dialogue rewriting, hybrid recall parameters (vector TopK, keyword TopK, rerank thresholds) and provides a visual debugging console.

Why Flink?

Flink offers high‑throughput, low‑latency stream processing with end‑to‑end reliability, back‑pressure control and fine‑grained resource tuning (memory, CPU, parallelism per operator). Bailei builds a multi‑stage pipeline (Chunking → Embedding → Hybrid Recall) using Flink SQL + UDF, enabling easy extension of new operators. The platform supports daily indexing of billions of documents; the “ultra‑fast QA” scenario achieves average recall latency of ~150 ms and P99 latency of ~200 ms.

Why DLF (Apache Paimon)?

Paimon provides a lake‑storage engine with LSM‑Tree based high‑concurrency writes, native upsert support, and second‑level compaction. It enables massive concurrent writes matching the billions‑scale indexing demand, time‑travel queries for historical disaster recovery, and seamless integration with Flink for unified batch‑stream management.

Bailei constructs a three‑layer reliability architecture: (1) Paimon as the physical backup store, (2) heterogeneous engine hot‑standby with automatic failover, and (3) second‑level switch within seconds, achieving 99.99% service availability.

Pure‑Visual RAG vs. Traditional Text RAG

Traditional text‑only RAG parses documents into plain text, losing spatial layout information in complex visual files (e.g., PPT, financial reports). Pure‑visual RAG treats each page as an image, extracts multimodal vectors via Tongyi Qianwen 3 VL Embedding, and performs retrieval directly on these vectors. Results: comparable performance on plain documents, but several percentage points higher accuracy on complex visual layouts, with amplified advantages in the generation stage.

Indexing Configuration Details

Offline Index Configuration

Chunking : intelligent chunking (recommended) or custom chunking by page, title, etc.

Meta Extraction : entity extraction using large‑model prompts (e.g., location), Excel header concatenation, etc.

Embedding Model Selection : supports multiple vector models.

Online Retrieval Configuration

Multi‑Round Dialogue Rewrite : optimises retrieval precision in multi‑turn conversations.

Hybrid Recall Parameters : vector TopK, keyword TopK, rerank model similarity threshold, final total TopK.

Retrieval Debugging Platform : visualises the impact of each configuration change on retrieval results.

Pure‑Visual RAG Pipeline

Offline Stage : each page is captured as an image; multimodal vectors (Tongyi Qianwen 3 VL Embedding) are stored without text splitting.

Online Stage : the textual query is embedded with the same VLM and used to retrieve page snapshots. Retrieval quality matches traditional text RAG on plain documents and exceeds it on visually rich documents.

Generation Stage : a multimodal LLM (e.g., Tongyi Qianwen 3.5 Plus) receives both the page snapshot and the query, enabling deeper understanding of layout and producing richer, image‑aware answers.

Internal evaluations show that pure‑visual RAG outperforms traditional text RAG by several percentage points on complex visual documents.

FAQ Highlights

Supported data sources : 30+ file formats, enterprise document platforms, MySQL/PostgreSQL/PolarDB, etc.

Difference between pure‑visual and text RAG : pure‑visual RAG retains layout information by searching on image embeddings, yielding higher accuracy on visually rich documents.

Indexing throughput : Flink‑based streaming supports billions of documents per day; ultra‑fast QA latency ~150 ms (average) / ~200 ms (P99).

Reliability : three‑layer backup (Paimon storage + hot‑standby engines + seconds‑level failover) guarantees 99.99% availability.

Table knowledge‑base freshness : second‑level incremental sync via the data connector, supporting both public and VPC networks.

Choice of Flink over Kafka Streams : Flink provides enterprise‑grade reliability, back‑pressure, and comprehensive operations tooling required for large‑scale RAG.

Cost benefits of async architecture : migrating to Flink Async reduces required concurrency from thousands to a few hundred, dramatically cutting resource consumption.

Conclusion

Bailei Knowledge Base combines full‑modal data coverage, a pure‑visual RAG breakthrough, Flink‑driven high‑throughput indexing and DLF/Paimon’s three‑layer reliable backup to deliver 99.99% availability and sub‑200 ms response times. Ongoing enhancements such as Agentic RAG, Flink async optimisation and Auto SQL will further unlock enterprise data value.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

FlinkRAGKnowledge BasemultimodalPaimonEnterprise AIDLF
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.