Build a PPT‑Powered RAG Engine with Visual Models and MCP Server
This article explains how to construct a Retrieval‑Augmented Generation (RAG) pipeline for multi‑page PPT documents by converting slides to images, extracting content with a vision model, indexing with LlamaIndex and Chroma, and exposing the functionality through an MCP Server with tools for adding, querying, and managing PPTs.
Overview
This article describes a complete Retrieval‑Augmented Generation (RAG) pipeline for PowerPoint (PPT) files. It extracts slide images, parses visual content with a vision model, indexes the resulting Markdown and images, and provides interactive query capabilities through an MCP Server.
System Architecture
The system consists of four components:
Overall framework
MCP Server – exposes tools add_ppt, chat_with_ppt, delete_ppt, and index_status for managing PPT documents.
RAG engine – handles indexing and generation phases.
Performance testing – demonstrates end‑to‑end queries.
MCP Server Implementation
Each tool is defined with the @app.tool() decorator. Example implementations:
@app.tool()
async def add_ppt(ctx: Context, file_path: str, force_reprocess: bool = False) -> str:
"""Add the specified PPT document to the RAG index.
Args:
ctx: Context object
file_path: Absolute or relative path to the PPT file
force_reprocess: Re‑process even if the document already exists
Returns:
JSON string with the operation result"""
try:
rag_engine = ctx.request_context.lifespan_context.rag_engine
result = await rag_engine.add_ppt_document(file_path, force_reprocess=force_reprocess)
return json.dumps(result, indent=2, ensure_ascii=False)
except Exception as e:
return json.dumps({"error": str(e)}) @app.tool()
async def chat_with_ppt(ctx: Context, query: str, file_path: Optional[str] = None, doc_id: Optional[str] = None) -> str:
try:
rag_engine = ctx.request_context.lifespan_context.rag_engine
result = await rag_engine.query(query, file_path=file_path, doc_id=doc_id)
return json.dumps(result, indent=2, ensure_ascii=False)
except Exception as e:
return json.dumps({"error": str(e)})RAG Engine Design
Indexing Phase
The indexing pipeline follows four steps:
PPT → Images : LibreOffice converts PPT to PDF; Pdfium splits the PDF into per‑slide PNG images.
Images → Markdown : A Doubao vision model parses each image using a detailed prompt and outputs Markdown that includes OCR text, tables, and descriptions.
Prepare Nodes : Markdown is wrapped into TextNode objects with metadata (source file, page number, image path, document ID). Nodes are cached with pickle to avoid re‑parsing.
Embedding & Indexing : Nodes are embedded with an OpenAI model, stored in a Chroma vector store, and persisted for fast reload.
# Create empty index if not present
if self._index is None:
vector_store = self._initialize_vector_store()
storage_context = StorageContext.from_defaults(vector_store=vector_store)
self._index = VectorStoreIndex([], storage_context=storage_context, show_progress=False)
# Insert nodes and persist
self._index.insert_nodes(nodes)
self._persist_index()Generation Phase
When a query arrives, the engine retrieves the top‑K most relevant nodes, fetches the associated slide images, and builds a prompt that combines the Markdown context and image references. The prompt is sent to the Doubao vision model, which generates a factual answer and cites the source slide and page number.
default_prompt = """
以下是PPT幻灯片中解析的Markdown文本和图片信息。Markdown文本已经尝试将相关图表转换为表格。优先使用图片信息来回答问题。在无法理解图像时使用Markdown文本信息。
---------------------
{context_str}
---------------------
-- 根据上下文信息并且不依赖先验知识, 回答查询。
-- 解释你是从解析的markdown、还是图片中得到答案的, 如果有差异, 请说明最终答案的理由。
-- 详细回答问题。
-- 给出重点参考的图片路径和页码。
查询: {query_str}
答案: """Metadata filters (e.g., source_file_id) enable selective retrieval for a specific PPT.
filters = MetadataFilters(filters=[MetadataFilter(key="source_file_id", value=filter_doc_id, operator=FilterOperator.EQ)])
retriever = VectorIndexRetriever(index=self._index, similarity_top_k=self.top_k, filters=filters)Testing and Demo
The MCP Server runs in Server‑Sent Events (SSE) mode. An interactive client can add PPTs, query index_status, and ask fact‑based questions. Responses include the answer and a reference such as “PPT name: slide 3”.
Future Optimizations
Generate slide summaries or hypothetical questions to enrich the vector store.
Apply relevance re‑ranking and multi‑step retrieval for higher accuracy.
Introduce additional index types (e.g., SummaryIndex) for different query patterns.
Integrate Agentic RAG to allow multiple tool calls within a single query.
Improve performance with asynchronous batch calls and parallel processing.
Resources
Source code: https://github.com/pingcy/app_chatppt
AI Large Model Application Practice
Focused on deep research and development of large-model applications. Authors of "RAG Application Development and Optimization Based on Large Models" and "MCP Principles Unveiled and Development Guide". Primarily B2B, with B2C as a supplement.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
