Backend Development 14 min read

How IMA Scaled Its AI Knowledge Base from Monolith to Micro‑services

This article walks through the end‑to‑end design of IMA's AI‑driven knowledge base, covering its definition, core business flow, architecture evolution, data ingestion pipelines, management challenges, asynchronous processing, permission modeling, and the business value demonstrated by the prototype.

Tencent Cloud Developer

Dec 24, 2025

How IMA Scaled Its AI Knowledge Base from Monolith to Micro‑services

0. Introduction

In the era of Retrieval‑Augmented Generation (RAG) and Large Language Models (LLM), a knowledge base must evolve from a passive digital repository to an intelligent assistant that can understand and converse.

1. What Is a Knowledge Base?

A knowledge base is a digital warehouse for centralized information sharing, similar to wikis, shared documents, or project libraries. Traditional keyword‑search‑driven bases only retrieve static content, while AI‑enabled bases support semantic understanding and dialogue.

2. Core Business Process

The IMA knowledge base lifecycle consists of three stages: Knowledge Ingestion , Knowledge Management , and Knowledge Application .

3. Architecture Design

3.1 Knowledge Ingestion

The ingestion layer must be extensible and stable. Three major challenges were identified:

Support for diverse data formats (20+ types, e.g., PDF, Word, XMind, audio).

Avoid tight coupling between external formats and internal logic.

Handle bursty traffic ("ingestion spikes") without overloading parsers.

Solution 1 – Unified Internal Data Model

Define a standard internal representation that decouples external sources from the system:

Media   // user‑visible object stored in the Media Center

Chunk   // low‑level unit for RAG indexing and retrieval

All incoming files are first converted to Media, then parsed into one or more Chunk objects.

Solution 2 – Isolate Change with a Two‑Layer Ingestion Pipeline

Separate the stable "Unified Access Layer" (creates Media) from the flexible "Parsing Layer" (produces Chunk). This isolates format‑specific logic and enables independent evolution.

Solution 3 – Asynchronous Spike‑Shaving

Introduce a message‑queue‑based async architecture to decouple front‑end ingestion requests from back‑end parsing. This smooths traffic spikes and prevents service overload.

3.2 Knowledge Management

Management operations (e.g., bulk edit, folder moves, deletions) involve multiple components and must remain consistent under high concurrency.

Solution – Service Decomposition

Split the system into atomic services (single‑purpose, stateless) and aggregated services (orchestrate complex workflows). This reduces coupling and improves scalability.

Data Consistency

Because Media and Chunk are processed asynchronously, temporary inconsistencies can appear. A dual‑guard mechanism is used: the Media status provides immediate visibility, while an asynchronous reconciliation service guarantees eventual consistency.

Permission Modeling

A multi‑level permission system protects data across personal, team, and enterprise scopes. The design follows deep modeling and a unified permission gateway, enabling fine‑grained access control and future extensibility.

3.3 Knowledge Application

After ingestion and management, the knowledge is consumed by AI‑driven services. The primary use case in IMA is RAG‑based Q&A, where user queries are answered by retrieving relevant Chunk data and feeding it to an LLM.

4. Results & Business Value

The prototype demonstrates that a modular, async‑first backend can handle diverse data formats, bursty ingestion, and strict consistency requirements while supporting AI‑enhanced retrieval. Early metrics show stable throughput under peak loads and reduced latency for RAG queries.

5. Summary

Architecture must evolve continuously; a solid design isolates change, embraces async processing, and enforces clear service boundaries. Practical value is measured by reduced development friction, higher system reliability, and the ability to deliver AI‑powered knowledge experiences.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

microservices RAG access control Data Consistency Knowledge Base asynchronous processing AI architecture

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.