Artificial Intelligence 18 min read

Optimizing Graph RAG: Boosting Global QA with Better Chunking, Prompts, and Entity Extraction

This article presents a comprehensive analysis of Graph RAG, detailing its implementation workflow, step‑by‑step execution guide, four targeted optimization strategies, and experimental validation that demonstrates significant improvements in global and local question answering for industry scenarios.

AsiaInfo Technology: New Tech Exploration

Dec 13, 2024

Optimizing Graph RAG: Boosting Global QA with Better Chunking, Prompts, and Entity Extraction

Abstract

Retrieval‑Augmented Generation (RAG) excels at answering questions that can be satisfied by a few retrieved text fragments, but it struggles with query‑focused summarization (QFS) that requires synthesizing information across many documents. Microsoft’s Graph RAG extends RAG by constructing a knowledge graph from the source corpus and generating community‑level summaries, enabling accurate answers to holistic queries. This summary presents the core workflow, practical execution steps, and four concrete optimization strategies that improve both global and local QA performance.

Background

Traditional RAG retrieves relevant passages and feeds them to a large language model (LLM). When a question demands a global view—e.g., “What are the main topics of these ten documents?”—the retrieved passages often omit essential context, leading to incomplete or hallucinated answers. Graph RAG mitigates this limitation by (1) extracting entities and relations from each document chunk, (2) building a knowledge graph, (3) partitioning the graph into communities with the Leiden algorithm, and (4) generating a report‑style summary for each community. Global QA then uses a map‑reduce pattern over community reports, while local QA retrieves nodes directly via embeddings.

Implementation Flow

The pipeline consists of two major phases: knowledge‑graph construction (purple) and global QA (green). Five sequential steps are performed:

Text Chunking : Documents are split into overlapping chunks defined by chunk_size and chunk_overlap.

Graph Information Extraction : Few‑shot prompts guide the LLM to extract nodes (entities) and edges (relations) from each chunk.

Graph Element Summarization : Duplicate entities are merged and a concise textual description is generated for each consolidated element.

Graph Construction & Community Detection : The aggregated entities and edges form a graph; the Leiden algorithm partitions it into communities.

Community Report Generation : For each community a report‑style summary is produced, serving as the basis for downstream QA.

Project Execution Steps

Install the library : pip install graphrag Create directories : mkdir -p ragtest/input Place input documents : Copy public text files into the ragtest/input folder.

Initialize configuration : Run the provided initialization command (e.g., graphrag init) to generate settings.yaml, which contains model API keys, embedding model selection, concurrency limits, and entity‑type definitions.

Build the knowledge graph : Execute the graph‑construction command as described in settings.yaml (e.g., graphrag build).

Optimization Strategies

1. Text Chunking Optimization

Replace fixed‑size chunking with a recursive, punctuation‑aware splitter that respects Chinese punctuation marks. After each split, prepend hierarchical titles (section headings) to preserve semantic continuity and reduce token loss.

2. Prompt Optimization

Translate the default English prompts to Chinese and enrich them with industry‑specific few‑shot examples. Explicitly instruct the LLM to focus on relevant entities and relations, which eliminates mixed‑language outputs.

3. Entity Extraction Accuracy

Introduce three validation layers:

Detect and discard triples where the entity name equals the entity type.

Swap misplaced entity‑type pairs to restore correct ordering.

Apply a dedicated classifier to filter hallucinated entities (e.g., “Star Card”).

4. Entity Information Completion

Adopt a two‑model approach: a large LLM performs the primary extraction, while a smaller auxiliary model assists in filling missing entities. Placeholder nodes with empty descriptions are created for absent entities and later replaced after a second pass of completion.

Effect Verification

Experiments were conducted on a telecom‑plan recommendation scenario. Human evaluators compared the baseline Graph RAG (default settings) with the optimized pipeline across relevance, factuality, and conciseness. The optimized system consistently outperformed the baseline, producing concise, accurate recommendations without mixed‑language artifacts or hallucinated plans.

Analysis of Improvements

Dynamic chunking preserved titles and reduced token loss, leading to higher fidelity in entity extraction (e.g., “5G畅享套餐” remained intact). Title augmentation helped the model associate later mentions with earlier context, improving recall. Prompt translation eliminated English‑Chinese mixing. Hallucination filtering removed spurious entities such as “Star Card.” The large‑small model combination increased graph completeness by reducing empty placeholder nodes.

References

Edge, D., et al. “From local to global: A Graph RAG approach to query‑focused summarization.” arXiv preprint arXiv:2404.16130, 2024.

Microsoft GraphRAG repository: https://github.com/microsoft/graphrag

Lu, Y., et al. “Unified structure generation for universal information extraction.” arXiv preprint arXiv:2203.12277, 2022.

prompt engineering Retrieval-Augmented Generation entity extraction LLM optimization Graph RAG

Written by

AsiaInfo Technology: New Tech Exploration

AsiaInfo's cutting‑edge ICT viewpoints and industry insights, featuring its latest technology and product case studies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.