Boost Answer Accuracy with GraphRAG: End‑to‑End Microsoft GraphRAG Code Walkthrough
This article walks through the complete GraphRAG workflow—from environment setup and indexing to command‑line and Python API queries—demonstrating how to build a knowledge graph, tune prompts, and retrieve high‑quality answers using Microsoft’s GraphRAG implementation.
1. GraphRAG Process Review
GraphRAG consists of three stages: indexing (knowledge‑graph construction), querying, and prompt tuning. Indexing splits the input text into chunks, extracts entities and relationships with a large language model, clusters entities using the Leiden algorithm, and generates community reports for global reasoning.
2. Environment Setup
Using anaconda, create a virtual environment and install the required packages:
conda create -n metaGraphRAG python=3.12
conda activate metaGraphRAG
conda install jupyterlab
conda install ipykernel
python -m ipykernel install --user --name metaGraphRAG --display-name "Python (metaGraphRAG)"Install GraphRAG:
pip install graphrag3. Project Configuration
Create the project folder structure ( openl/input) and place the source document (e.g., 大数据时代.txt) inside openl/input. Initialise the project: graphrag init --root ./openl Edit settings.yaml to select the model (e.g., Qwen/Qwen3-8B for chat and BAAI/bge-m3 for embeddings) and adjust chunk size:
chunks:
size: 50
overlap: 10
group_by_columns: [id]4. Knowledge‑Graph Construction (Command Line)
Run the indexing command to generate parquet tables ( entities.parquet, relationships.parquet, communities.parquet, etc.). The tables are stored in the output directory. graphrag index --root ./openl Typical output files:
communities.parquet – community table
community_reports.parquet – community reports (used for global search)
documents.parquet – original document
entities.parquet – entity table
relationships.parquet – relationship table
text_units.parquet – text‑chunk table
5. Inspecting the Graph (Jupyter Notebook)
Read the parquet files with pandas and visualise the graph using yfiles_jupyter_graphs:
import pandas as pd
entities_df = pd.read_parquet('openl/output/entities.parquet')
relationships_df = pd.read_parquet('openl/output/relationships.parquet')
from yfiles_jupyter_graphs import GraphWidget
w = GraphWidget()
w.directed = True
w.nodes = convert_entities_to_dicts(entities_df)
w.edges = convert_relationships_to_dicts(relationships_df)
# colour mapping functions omitted for brevity
w.circular_layout()
display(w)6. Command‑Line Querying
After indexing, answer questions directly:
graphrag query --root D:\Learning\大模型\GraphRAG\openl --method local --query "请介绍《大数据时代》"Switch to global search by changing --method global. The results are displayed in the console.
7. Python API – Global Search
Import GraphRAG modules and build a global search engine:
from graphrag.config.models.graph_rag_config import GraphRagConfig
from graphrag.query.factory import get_global_search_engine
from graphrag.utils.api import load_search_prompt
# Load configuration
import yaml, pathlib
with open('settings.yaml','r',encoding='utf-8') as f:
cfg = yaml.load(f.read(), Loader=yaml.FullLoader)
cfg['root_dir'] = str(pathlib.Path('openl').resolve())
config = GraphRagConfig(**cfg)
# Load parquet tables
import pandas as pd
entities = pd.read_parquet('openl/output/entities.parquet')
communities = pd.read_parquet('openl/output/communities.parquet')
community_reports = pd.read_parquet('openl/output/community_reports.parquet')
# Load prompts
map_prompt = load_search_prompt(config.root_dir, config.global_search.map_prompt)
reduce_prompt = load_search_prompt(config.root_dir, config.global_search.reduce_prompt)
knowledge_prompt = load_search_prompt(config.root_dir, config.global_search.knowledge_prompt)
# Build engine
engine = get_global_search_engine(
config,
reports=read_indexer_reports(community_reports, communities),
entities=read_indexer_entities(entities, communities),
communities=read_indexer_communities(communities, community_reports),
response_type='Single Paragraph',
map_system_prompt=map_prompt,
reduce_system_prompt=reduce_prompt,
general_knowledge_inclusion_prompt=knowledge_prompt,
)
response = engine.search(query='请介绍《大数据时代》')
print(response)8. Python API – Local Search
Local search operates on the text‑embedding store and entity table without loading community reports:
from graphrag.query.factory import get_local_search_engine
# Prepare vector store (embedding store) – omitted for brevity
engine = get_local_search_engine(
config,
reports=read_indexer_reports(community_reports, communities),
text_units=read_indexer_text_units(text_units_df),
entities=read_indexer_entities(entities_df, communities),
relationships=read_indexer_relationships(relationships_df),
description_embedding_store=embedding_store,
response_type='Single Paragraph',
system_prompt=load_search_prompt(config.root_dir, config.local_search.prompt),
)
response = engine.search(query='请介绍《大数据时代》')
print(response)9. Summary
The guide demonstrates the full GraphRAG workflow: reviewing the pipeline, setting up the environment, configuring the project, constructing the knowledge graph, visualising the graph, and performing both command‑line and programmatic queries. By following these steps, developers can integrate GraphRAG into their own applications and significantly improve retrieval‑augmented generation performance.
Fun with Large Models
Master's graduate from Beijing Institute of Technology, published four top‑journal papers, previously worked as a developer at ByteDance and Alibaba. Currently researching large models at a major state‑owned enterprise. Committed to sharing concise, practical AI large‑model development experience, believing that AI large models will become as essential as PCs in the future. Let's start experimenting now!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
