Artificial Intelligence 25 min read

Boost Answer Accuracy with GraphRAG: End‑to‑End Microsoft GraphRAG Code Walkthrough

This article walks through the complete GraphRAG workflow—from environment setup and indexing to command‑line and Python API queries—demonstrating how to build a knowledge graph, tune prompts, and retrieve high‑quality answers using Microsoft’s GraphRAG implementation.

Fun with Large Models

Jul 4, 2025

Boost Answer Accuracy with GraphRAG: End‑to‑End Microsoft GraphRAG Code Walkthrough

1. GraphRAG Process Review

GraphRAG consists of three stages: indexing (knowledge‑graph construction), querying, and prompt tuning. Indexing splits the input text into chunks, extracts entities and relationships with a large language model, clusters entities using the Leiden algorithm, and generates community reports for global reasoning.

2. Environment Setup

Using anaconda, create a virtual environment and install the required packages:

conda create -n metaGraphRAG python=3.12
conda activate metaGraphRAG
conda install jupyterlab
conda install ipykernel
python -m ipykernel install --user --name metaGraphRAG --display-name "Python (metaGraphRAG)"

Install GraphRAG:

pip install graphrag

3. Project Configuration

Create the project folder structure ( openl/input) and place the source document (e.g., 大数据时代.txt) inside openl/input. Initialise the project: graphrag init --root ./openl Edit settings.yaml to select the model (e.g., Qwen/Qwen3-8B for chat and BAAI/bge-m3 for embeddings) and adjust chunk size:

chunks:
  size: 50
  overlap: 10
  group_by_columns: [id]

4. Knowledge‑Graph Construction (Command Line)

Run the indexing command to generate parquet tables ( entities.parquet, relationships.parquet, communities.parquet, etc.). The tables are stored in the output directory. graphrag index --root ./openl Typical output files:

communities.parquet – community table

community_reports.parquet – community reports (used for global search)

documents.parquet – original document

entities.parquet – entity table

relationships.parquet – relationship table

text_units.parquet – text‑chunk table

5. Inspecting the Graph (Jupyter Notebook)

Read the parquet files with pandas and visualise the graph using yfiles_jupyter_graphs:

import pandas as pd
entities_df = pd.read_parquet('openl/output/entities.parquet')
relationships_df = pd.read_parquet('openl/output/relationships.parquet')
from yfiles_jupyter_graphs import GraphWidget
w = GraphWidget()
w.directed = True
w.nodes = convert_entities_to_dicts(entities_df)
w.edges = convert_relationships_to_dicts(relationships_df)
# colour mapping functions omitted for brevity
w.circular_layout()
display(w)

6. Command‑Line Querying

After indexing, answer questions directly:

graphrag query --root D:\Learning\大模型\GraphRAG\openl --method local --query "请介绍《大数据时代》"

Switch to global search by changing --method global. The results are displayed in the console.

7. Python API – Global Search

Import GraphRAG modules and build a global search engine:

from graphrag.config.models.graph_rag_config import GraphRagConfig
from graphrag.query.factory import get_global_search_engine
from graphrag.utils.api import load_search_prompt
# Load configuration
import yaml, pathlib
with open('settings.yaml','r',encoding='utf-8') as f:
    cfg = yaml.load(f.read(), Loader=yaml.FullLoader)
cfg['root_dir'] = str(pathlib.Path('openl').resolve())
config = GraphRagConfig(**cfg)
# Load parquet tables
import pandas as pd
entities = pd.read_parquet('openl/output/entities.parquet')
communities = pd.read_parquet('openl/output/communities.parquet')
community_reports = pd.read_parquet('openl/output/community_reports.parquet')
# Load prompts
map_prompt = load_search_prompt(config.root_dir, config.global_search.map_prompt)
reduce_prompt = load_search_prompt(config.root_dir, config.global_search.reduce_prompt)
knowledge_prompt = load_search_prompt(config.root_dir, config.global_search.knowledge_prompt)
# Build engine
engine = get_global_search_engine(
    config,
    reports=read_indexer_reports(community_reports, communities),
    entities=read_indexer_entities(entities, communities),
    communities=read_indexer_communities(communities, community_reports),
    response_type='Single Paragraph',
    map_system_prompt=map_prompt,
    reduce_system_prompt=reduce_prompt,
    general_knowledge_inclusion_prompt=knowledge_prompt,
)
response = engine.search(query='请介绍《大数据时代》')
print(response)

8. Python API – Local Search

Local search operates on the text‑embedding store and entity table without loading community reports:

from graphrag.query.factory import get_local_search_engine
# Prepare vector store (embedding store) – omitted for brevity
engine = get_local_search_engine(
    config,
    reports=read_indexer_reports(community_reports, communities),
    text_units=read_indexer_text_units(text_units_df),
    entities=read_indexer_entities(entities_df, communities),
    relationships=read_indexer_relationships(relationships_df),
    description_embedding_store=embedding_store,
    response_type='Single Paragraph',
    system_prompt=load_search_prompt(config.root_dir, config.local_search.prompt),
)
response = engine.search(query='请介绍《大数据时代》')
print(response)

9. Summary

The guide demonstrates the full GraphRAG workflow: reviewing the pipeline, setting up the environment, configuring the project, constructing the knowledge graph, visualising the graph, and performing both command‑line and programmatic queries. By following these steps, developers can integrate GraphRAG into their own applications and significantly improve retrieval‑augmented generation performance.

Python Command Line Microsoft Retrieval-Augmented Generation Knowledge Graph prompt tuning GraphRAG

Written by

Fun with Large Models

Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.