How BAML Turns a 25% Success Rate into 99%+ for Knowledge‑Graph Extraction with Small LLMs

This article presents a systematic study of extracting knowledge graphs from unstructured news articles using small quantized LLMs, exposing the brittleness of LangChain's JSON‑based pipelines, evaluating prompt‑engineering fixes, and introducing the BAML framework whose fuzzy parsing and concise schema raise extraction success from roughly 25% to over 99% on a 344‑document benchmark.

Data Party THU
Data Party THU
Data Party THU
How BAML Turns a 25% Success Rate into 99%+ for Knowledge‑Graph Extraction with Small LLMs

Background

Retrieval‑augmented generation (RAG) systems and intelligent agents need to extract entities and relationships from raw text. Small quantized local LLMs often produce malformed JSON, causing high failure rates in downstream pipelines.

Evaluation Dataset Construction and Token Counting

A public news‑article dataset from the Tomasonjo GitHub repository is loaded into a pandas.DataFrame. A new tokens column is added by counting tokens with tiktoken using the gpt-4o encoding.

import pandas as pd, tiktoken
news = pd.read_csv("https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/news_articles.csv")

def num_tokens_from_string(s: str, model: str = "gpt-4o") -> int:
    enc = tiktoken.encoding_for_model(model)
    return len(enc.encode(s))

news["tokens"] = [num_tokens_from_string(f"{r['title']} {r['text']}") for _, r in news.iterrows()]

Baseline Evaluation with LangChain LLMGraphTransformer

The standard LLMGraphTransformer from langchain_experimental is instantiated with an Ollama‑served llama3 model. Ten‑worker parallel processing of the first 20 articles yields GraphDocument objects that are empty for 75 % of inputs, demonstrating that strict JSON parsing is a bottleneck.

from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_community.graphs.graph_document import GraphDocument, Node, Relationship
from langchain_core.documents import Document
from langchain_ollama import ChatOllama
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm

llm = ChatOllama(model="llama3", temperature=0.001)
llm_transformer = LLMGraphTransformer(llm=llm, node_properties=["description"], relationship_properties=["description"])

def process_text(text: str) -> GraphDocument:
    doc = Document(page_content=text)
    return llm_transformer.convert_to_graph_documents([doc])[0]

graph_documents = []
with ThreadPoolExecutor(max_workers=10) as executor:
    futures = [executor.submit(process_text, f"{row['title']} {row['text']}") for _, row in news.head(20).iterrows()]
    for f in tqdm(as_completed(futures), total=len(futures), desc="Processing"):
        graph_documents.append(f.result())

Prompt‑Engineering Attempt

A custom ChatPromptTemplate with a Pydantic schema and JsonOutputParser is used on the same 20‑article sample. The failure rate drops to 62 % but parsing errors remain frequent, showing that prompt tuning alone cannot overcome the strict JSON requirement.

from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List
from langchain_core.output_parsers.json import JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate

class Node(BaseModel):
    id: str = Field(description="Unique identifier")
    type: str = Field(description="Entity type")

class Relationship(BaseModel):
    source: Node = Field(description="Source node")
    target: Node = Field(description="Target node")
    type: str = Field(description="Relationship type")

class KnowledgeGraph(BaseModel):
    nodes: List[Node] = Field(description="List of nodes")
    relationships: List[Relationship] = Field(description="List of relationships")

parser = JsonOutputParser(pydantic_object=KnowledgeGraph)

template = f"""
Extract a knowledge graph from the input text and output a JSON object matching the schema below.
{parser.get_format_instructions()}
Input text:
{{text}}
"""
prompt = ChatPromptTemplate.from_template(template)
chain = prompt | llm | parser

results = []
for _, row in news.head(20).iterrows():
    try:
        out = chain.invoke({"text": f"{row['title']} {row['text']}"})
        results.append(out)
    except Exception:
        results.append({})

BAML (Basically, A Made‑up Language)

BAML replaces verbose JSON schemas with a concise TypeScript‑like definition and provides a fuzzy parser that tolerates missing commas, stray quotes, or extra text. The language is stored in .baml files and executed via the baml-py client.

// SimpleNode definition
class SimpleNode {
  id string               // Unique identifier
  type string             // Entity type
  properties Properties   // Optional attributes
}

class Properties {
  description string?      // Optional description
}

class SimpleRelationship {
  source_node_id string
  source_node_type string
  target_node_id string
  target_node_type string
  type string
  properties Properties
}

class DynamicGraph {
  nodes SimpleNode[]
  relationships SimpleRelationship[]
}

function ExtractGraph(graph: string) -> DynamicGraph {
  client Ollama
  prompt #"Extract from this content: {{ ctx.output_format }} {{ graph }}"
}

Integration with LangChain

Helper functions translate BAML output into LangChain GraphDocument objects. An asynchronous chain sends the input text to the BAML ExtractGraph function, receives a DynamicGraph, and maps nodes and relationships back to LangChain types.

from typing import Any, List
import baml_client as client
from langchain_community.graphs.graph_document import GraphDocument, Node, Relationship
from langchain_core.runnables import chain

def _format_nodes(nodes: List[Node]) -> List[Node]:
    return [Node(id=n.id.title() if isinstance(n.id, str) else n.id,
                 type=n.type.capitalize() if n.type else None,
                 properties=n.properties) for n in nodes]

def map_to_base_relationship(rel: Any) -> Relationship:
    source = Node(id=rel.source_node_id, type=rel.source_node_type)
    target = Node(id=rel.target_node_id, type=rel.target_node_type)
    return Relationship(source=source, target=target, type=rel.type, properties=rel.properties)

@chain
async def get_graph(message):
    graph = await client.b.ExtractGraph(graph=message.content)
    return graph

Large‑Scale Validation with BAML

The BAML pipeline processes 344 news articles in 4‑article chunks using asynchronous calls. Only 0.58 % of the resulting GraphDocument objects are empty, yielding a 99.4 % successful extraction rate.

import asyncio
from tqdm import tqdm

async def aprocess_response(document):
    resp = await chain.ainvoke({"input": document.page_content})
    return GraphDocument(nodes=_format_nodes(resp.nodes),
                         relationships=_format_relationships(resp.relationships),
                         source=document)

async def aprocess_text(texts):
    docs = [Document(page_content=t) for t in texts]
    return await asyncio.gather(*[aprocess_response(d) for d in docs])

graph_documents_baml = []
chunk_size = 4
for i in tqdm(range(0, len(news), chunk_size), desc="Processing chunks"):
    chunk = [f"{row['title']} {row['text']}" for _, row in news.iloc[i:i+chunk_size].iterrows()]
    docs = await aprocess_text(chunk)
    graph_documents_baml.extend(docs)

missing_pct = sum(1 for d in graph_documents_baml if not d.nodes) / len(graph_documents_baml) * 100
print(f"Percentage missing with BAML: {missing_pct}%")

Import into Neo4j and Graph‑Science Analysis

The high‑quality graph is loaded into a Neo4j instance. Nodes receive the __Entity__ label and the original source text is retained. Token count correlates positively with extracted entity count, and the node‑degree distribution shows a long‑tail pattern with a few hub entities.

import os
from langchain_community.graphs import Neo4jGraph

os.environ["NEO4J_URI"] = "bolt://localhost:7687"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "your_password"
os.environ["DATABASE"] = "graphragdemo"

graph = Neo4jGraph()
graph.add_graph_documents(graph_documents_baml, baseEntityLabel=True, include_source=True)

Vector embeddings for each entity are generated with the local llama3 model via OllamaEmbeddings and stored in Neo4j using Neo4jVector. A k‑nearest‑neighbors (kNN) step creates SIMILAR relationships for pairs whose cosine similarity exceeds 0.95.

from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Neo4jVector

embeddings = OllamaEmbeddings(model="llama3")
vector = Neo4jVector.from_existing_graph(
    embeddings,
    node_label="__Entity__",
    text_node_properties=["id", "description"],
    embedding_node_property="embedding",
    database=os.environ["DATABASE"]
)

Potential duplicate entities are identified via the SIMILAR edges and filtered with a Levenshtein distance ≤ 3. An LLM call decides which name to keep, and the duplicates are merged in Neo4j using apoc.refactor.mergeNodes.

# Example merge (pseudo‑code)
merged_entities = [["David Van", "Davidvan"], ["Elon Musk", "Elonmusk"]]
graph.query("""
UNWIND $data AS candidates
CALL {
  WITH candidates
  MATCH (e:`__Entity__`) WHERE e.id IN candidates
  RETURN collect(e) AS nodes
}
CALL apoc.refactor.mergeNodes(nodes, {properties: {'`.*`': 'discard'}}) YIELD node
RETURN count(*)
""", params={"data": merged_entities})

Leiden community detection is performed on an in‑memory projection that includes the embedding property. Multi‑level community IDs are written back to each entity, and explicit __Community__ nodes are created to represent the hierarchy.

from graphdatascience import GraphDataScience

gds = GraphDataScience(os.environ["NEO4J_URI"], auth=(os.environ["NEO4J_USERNAME"], os.environ["NEO4J_PASSWORD"]))
gds.set_database(os.environ["DATABASE"])

G, _ = gds.graph.project(
    "communities",
    "__Entity__",
    "*",
    nodeProperties=["embedding"]
)

gds.leiden.write(G, writeProperty="communities", includeIntermediateCommunities=True, relationshipWeightProperty="weight")

Conclusion

The study shows that the primary obstacle for small quantized LLMs in knowledge‑graph extraction is the strict JSON parsing required by existing LangChain tools. Replacing the rigid parser with BAML’s tolerant schema language raises extraction success from ~25 % to >99 % on a realistic 344‑document benchmark. The resulting high‑quality graph can be enriched with vector embeddings, de‑duplicated via similarity search, and organized into multi‑scale communities using Leiden clustering, providing a production‑ready pipeline for GraphRAG applications.

Full reproducible code is available at https://github.com/FareedKhan-dev/langchain-graphrag-baml

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMLangChainNeo4jGraphRAGBAMLknowledge graph extractionLeiden clustering
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.