How IMA Scaled Its AI Knowledge Base from Monolith to Micro‑services

This article walks through the end‑to‑end design of IMA's AI‑driven knowledge base, covering its definition, core business flow, architecture evolution, data ingestion pipelines, management challenges, asynchronous processing, permission modeling, and the business value demonstrated by the prototype.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
How IMA Scaled Its AI Knowledge Base from Monolith to Micro‑services

0. Introduction

In the era of Retrieval‑Augmented Generation (RAG) and Large Language Models (LLM), a knowledge base must evolve from a passive digital repository to an intelligent assistant that can understand and converse.

1. What Is a Knowledge Base?

A knowledge base is a digital warehouse for centralized information sharing, similar to wikis, shared documents, or project libraries. Traditional keyword‑search‑driven bases only retrieve static content, while AI‑enabled bases support semantic understanding and dialogue.

2. Core Business Process

The IMA knowledge base lifecycle consists of three stages: Knowledge Ingestion , Knowledge Management , and Knowledge Application .

Knowledge base core process diagram
Knowledge base core process diagram

3. Architecture Design

3.1 Knowledge Ingestion

The ingestion layer must be extensible and stable. Three major challenges were identified:

Support for diverse data formats (20+ types, e.g., PDF, Word, XMind, audio).

Avoid tight coupling between external formats and internal logic.

Handle bursty traffic ("ingestion spikes") without overloading parsers.

Solution 1 – Unified Internal Data Model

Define a standard internal representation that decouples external sources from the system:

Media   // user‑visible object stored in the Media Center
Chunk   // low‑level unit for RAG indexing and retrieval

All incoming files are first converted to Media, then parsed into one or more Chunk objects.

Supported file formats
Supported file formats

Solution 2 – Isolate Change with a Two‑Layer Ingestion Pipeline

Separate the stable "Unified Access Layer" (creates Media) from the flexible "Parsing Layer" (produces Chunk). This isolates format‑specific logic and enables independent evolution.

Unified Access vs. Parsing layers
Unified Access vs. Parsing layers

Solution 3 – Asynchronous Spike‑Shaving

Introduce a message‑queue‑based async architecture to decouple front‑end ingestion requests from back‑end parsing. This smooths traffic spikes and prevents service overload.

Async ingestion architecture
Async ingestion architecture

3.2 Knowledge Management

Management operations (e.g., bulk edit, folder moves, deletions) involve multiple components and must remain consistent under high concurrency.

Solution – Service Decomposition

Split the system into atomic services (single‑purpose, stateless) and aggregated services (orchestrate complex workflows). This reduces coupling and improves scalability.

Service decomposition diagram
Service decomposition diagram

Data Consistency

Because Media and Chunk are processed asynchronously, temporary inconsistencies can appear. A dual‑guard mechanism is used: the Media status provides immediate visibility, while an asynchronous reconciliation service guarantees eventual consistency.

Consistency safeguard diagram
Consistency safeguard diagram

Permission Modeling

A multi‑level permission system protects data across personal, team, and enterprise scopes. The design follows deep modeling and a unified permission gateway, enabling fine‑grained access control and future extensibility.

Permission architecture
Permission architecture

3.3 Knowledge Application

After ingestion and management, the knowledge is consumed by AI‑driven services. The primary use case in IMA is RAG‑based Q&A, where user queries are answered by retrieving relevant Chunk data and feeding it to an LLM.

Simple QA flow
Simple QA flow

4. Results & Business Value

The prototype demonstrates that a modular, async‑first backend can handle diverse data formats, bursty ingestion, and strict consistency requirements while supporting AI‑enhanced retrieval. Early metrics show stable throughput under peak loads and reduced latency for RAG queries.

Knowledge base evolution
Knowledge base evolution

5. Summary

Architecture must evolve continuously; a solid design isolates change, embraces async processing, and enforces clear service boundaries. Practical value is measured by reduced development friction, higher system reliability, and the ability to deliver AI‑powered knowledge experiences.

Call to action
Call to action
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

microservicesRAGaccess controlData ConsistencyKnowledge Baseasynchronous processingAI architecture
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.