Artificial Intelligence 14 min read

Boost LLM Answers with GraphRAG: Build Knowledge Graphs Using Neo4j

This guide explains how to enhance large language model outputs by constructing a GraphRAG pipeline that transforms relational data into a Neo4j knowledge graph, creates nodes and relationships via Python, and leverages LangChain's GraphCypherQAChain for accurate, context‑aware retrieval and generation.

AI Large Model Application Practice

Jul 11, 2024

Boost LLM Answers with GraphRAG: Build Knowledge Graphs Using Neo4j

Retrieval‑augmented generation (RAG) improves large language model (LLM) responses by grounding them in external knowledge, typically using vector similarity. GraphRAG offers an alternative by storing knowledge in a graph database, enabling richer context and more precise reasoning.

Fundamentals of GraphRAG

A graph consists of nodes (entities) and edges (relationships). Graph databases such as Neo4j, Amazon Neptune, OrientDB, and TigerGraph excel at handling highly connected data, providing flexible modeling, efficient multi‑hop queries, and scalability.

From Structured Data to a Knowledge Graph

We start with relational tables in MySQL (orders, customers, products, salespersons, departments). The tutorial shows how to fetch each table into a Pandas DataFrame, then convert rows into Neo4j nodes using the Neo4j Python driver.

import pandas as pd
import mysql.connector
from neo4j import GraphDatabase

def fetch_table_data(table_name):
    cnx = mysql.connector.connect(
        host='YOUR_HOST',
        user='YOUR_USER',
        password='******',
        database='sales'
    )
    cursor = cnx.cursor()
    query = f"SELECT * FROM {table_name}"
    cursor.execute(query)
    rows = cursor.fetchall()
    column_names = [desc[0] for desc in cursor.description]
    df = pd.DataFrame(rows, columns=column_names)
    cursor.close()
    cnx.close()
    return df

orders_df = fetch_table_data('orders')
customers_df = fetch_table_data('customers')
products_df = fetch_table_data('products')
salespersons_df = fetch_table_data('salespersons')
departs_df = fetch_table_data('departs')

Node creation ensures uniqueness by checking existing nodes before issuing a CREATE statement:

def create_unique_nodes_from_dataframe(df, label, unique_id_property):
    driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', '******'))
    with driver.session() as session:
        for _, row in df.iterrows():
            unique_id_value = row[unique_id_property]
            query = f"MATCH (n:{label} {{{unique_id_property}: '{unique_id_value}'}}) RETURN n"
            if session.run(query).single():
                continue
            properties = ", ".join([f"{k}: '{v}'" for k, v in row.to_dict().items()])
            create_query = f"CREATE (n:{label} {{ {properties} }})"
            session.run(create_query)

create_unique_nodes_from_dataframe(orders_df, 'Order', 'order_id')
# repeat for other tables …

Relationships are then established with Cypher MERGE statements to avoid duplicates:

def create_relationships():
    driver = GraphDatabase.driver('bolt://localhost:7687', auth=('neo4j', '******'))
    with driver.session() as session:
        session.run("MATCH (c:Customer), (o:Order) WHERE c.customer_id = o.customer_id MERGE (c)-[:ORDERED]->(o)")
        session.run("MATCH (s:Salesperson), (o:Order) WHERE s.salesperson_id = o.salesperson_id MERGE (s)-[:CREATED]->(o)")
        session.run("MATCH (p:Product), (o:Order) WHERE p.product_id = o.product_id MERGE (p)-[:IS_ORDERED_IN]->(o)")
        session.run("MATCH (s:Salesperson), (d:Depart) WHERE s.depart_id = d.depart_id MERGE (s)-[:BELONGS]->(d)")

create_relationships()

Querying the Graph with LangChain

LangChain’s GraphCypherQAChain converts natural‑language questions into Cypher queries (via an LLM) and then formats the query results into human‑readable answers.

from langchain.prompts import PromptTemplate
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI

cypher_template = """
Task: Generate a Cypher query for a Neo4j graph.
Schema:
{schema}
Question:
{question}
"""
cypher_prompt = PromptTemplate(input_variables=["schema", "question"], template=cypher_template)

qa_template = """
You are an assistant that turns Cypher results into a concise answer.
Context:
{context}
Question:
{question}
"""
qa_prompt = PromptTemplate(input_variables=["context", "question"], template=qa_template)

graph = Neo4jGraph(url='bolt://localhost:7687', username='neo4j', password='******')
graph.refresh_schema()

chain = GraphCypherQAChain.from_llm(
    cypher_llm=ChatOpenAI(model='gpt-4o'),
    qa_llm=ChatOpenAI(model='gpt-4o'),
    graph=graph,
    cypher_prompt=cypher_prompt,
    qa_prompt=qa_prompt,
    validate_cypher=True,
    top_k=100,
    verbose=True
)

response = chain.invoke("Which products did 黄彬伟 order and what is the total amount?")
print(response['result'])

Running the above queries yields full Cypher statements, query results, and LLM‑generated answers, demonstrating how GraphRAG can provide more accurate, multi‑hop reasoning compared to pure vector retrieval, especially on large datasets.

Takeaways

By converting relational data into a graph, leveraging Neo4j’s powerful multi‑hop queries, and integrating LangChain’s LLM‑driven query generation, developers can build robust GraphRAG pipelines that reduce hallucinations and improve answer relevance. The next step is to extend this workflow to unstructured text, extracting entities into the knowledge graph for a complete GraphRAG solution.

Python LLM LangChain Neo4j GraphRAG

Written by

AI Large Model Application Practice

Focused on deep research and development of large-model applications. Authors of "RAG Application Development and Optimization Based on Large Models" and "MCP Principles Unveiled and Development Guide". Primarily B2B, with B2C as a supplement.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.