Boost LLM Answers with GraphRAG: Build Knowledge Graphs Using Neo4j
This guide explains how to enhance large language model outputs by constructing a GraphRAG pipeline that transforms relational data into a Neo4j knowledge graph, creates nodes and relationships via Python, and leverages LangChain's GraphCypherQAChain for accurate, context‑aware retrieval and generation.
Retrieval‑augmented generation (RAG) improves large language model (LLM) responses by grounding them in external knowledge, typically using vector similarity. GraphRAG offers an alternative by storing knowledge in a graph database, enabling richer context and more precise reasoning.
Fundamentals of GraphRAG
A graph consists of nodes (entities) and edges (relationships). Graph databases such as Neo4j, Amazon Neptune, OrientDB, and TigerGraph excel at handling highly connected data, providing flexible modeling, efficient multi‑hop queries, and scalability.
From Structured Data to a Knowledge Graph
We start with relational tables in MySQL (orders, customers, products, salespersons, departments). The tutorial shows how to fetch each table into a Pandas DataFrame, then convert rows into Neo4j nodes using the Neo4j Python driver.
import pandas as pd
import mysql.connector
from neo4j import GraphDatabase
def fetch_table_data(table_name):
cnx = mysql.connector.connect(
host='YOUR_HOST',
user='YOUR_USER',
password='******',
database='sales'
)
cursor = cnx.cursor()
query = f"SELECT * FROM {table_name}"
cursor.execute(query)
rows = cursor.fetchall()
column_names = [desc[0] for desc in cursor.description]
df = pd.DataFrame(rows, columns=column_names)
cursor.close()
cnx.close()
return df
orders_df = fetch_table_data('orders')
customers_df = fetch_table_data('customers')
products_df = fetch_table_data('products')
salespersons_df = fetch_table_data('salespersons')
departs_df = fetch_table_data('departs')Node creation ensures uniqueness by checking existing nodes before issuing a CREATE statement:
def create_unique_nodes_from_dataframe(df, label, unique_id_property):
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', '******'))
with driver.session() as session:
for _, row in df.iterrows():
unique_id_value = row[unique_id_property]
query = f"MATCH (n:{label} {{{unique_id_property}: '{unique_id_value}'}}) RETURN n"
if session.run(query).single():
continue
properties = ", ".join([f"{k}: '{v}'" for k, v in row.to_dict().items()])
create_query = f"CREATE (n:{label} {{ {properties} }})"
session.run(create_query)
create_unique_nodes_from_dataframe(orders_df, 'Order', 'order_id')
# repeat for other tables …Relationships are then established with Cypher MERGE statements to avoid duplicates:
def create_relationships():
driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', '******'))
with driver.session() as session:
session.run("MATCH (c:Customer), (o:Order) WHERE c.customer_id = o.customer_id MERGE (c)-[:ORDERED]->(o)")
session.run("MATCH (s:Salesperson), (o:Order) WHERE s.salesperson_id = o.salesperson_id MERGE (s)-[:CREATED]->(o)")
session.run("MATCH (p:Product), (o:Order) WHERE p.product_id = o.product_id MERGE (p)-[:IS_ORDERED_IN]->(o)")
session.run("MATCH (s:Salesperson), (d:Depart) WHERE s.depart_id = d.depart_id MERGE (s)-[:BELONGS]->(d)")
create_relationships()Querying the Graph with LangChain
LangChain’s GraphCypherQAChain converts natural‑language questions into Cypher queries (via an LLM) and then formats the query results into human‑readable answers.
from langchain.prompts import PromptTemplate
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI
cypher_template = """
Task: Generate a Cypher query for a Neo4j graph.
Schema:
{schema}
Question:
{question}
"""
cypher_prompt = PromptTemplate(input_variables=["schema", "question"], template=cypher_template)
qa_template = """
You are an assistant that turns Cypher results into a concise answer.
Context:
{context}
Question:
{question}
"""
qa_prompt = PromptTemplate(input_variables=["context", "question"], template=qa_template)
graph = Neo4jGraph(url='bolt://localhost:7687', username='neo4j', password='******')
graph.refresh_schema()
chain = GraphCypherQAChain.from_llm(
cypher_llm=ChatOpenAI(model='gpt-4o'),
qa_llm=ChatOpenAI(model='gpt-4o'),
graph=graph,
cypher_prompt=cypher_prompt,
qa_prompt=qa_prompt,
validate_cypher=True,
top_k=100,
verbose=True
)
response = chain.invoke("Which products did 黄彬伟 order and what is the total amount?")
print(response['result'])Running the above queries yields full Cypher statements, query results, and LLM‑generated answers, demonstrating how GraphRAG can provide more accurate, multi‑hop reasoning compared to pure vector retrieval, especially on large datasets.
Takeaways
By converting relational data into a graph, leveraging Neo4j’s powerful multi‑hop queries, and integrating LangChain’s LLM‑driven query generation, developers can build robust GraphRAG pipelines that reduce hallucinations and improve answer relevance. The next step is to extend this workflow to unstructured text, extracting entities into the knowledge graph for a complete GraphRAG solution.
AI Large Model Application Practice
Focused on deep research and development of large-model applications. Authors of "RAG Application Development and Optimization Based on Large Models" and "MCP Principles Unveiled and Development Guide". Primarily B2B, with B2C as a supplement.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
