Building a Vector‑Free RAG System with Hierarchical Page Indexing
This guide explains how to create a retrieval‑augmented generation (RAG) system that avoids embeddings by converting documents into a hierarchical tree, using an LLM to navigate, summarize, and retrieve answers, complete with a full Python implementation and a GitHub repository.
Overview
This article describes a vector‑free, inference‑based Retrieval‑Augmented Generation (RAG) system that builds a hierarchical page index. The document is transformed into a tree of sections and subsections, allowing a Large Language Model (LLM) to navigate the tree level‑by‑level to locate the most relevant leaf node and use its raw text as context for answer generation. No embeddings or similarity search are required.
Overall Plan
Parse the document into a hierarchical tree – The document is sent to an LLM, which splits it into top‑level sections. Sections longer than a configurable threshold (default 300 words) are recursively split into subsections, producing a multi‑level tree where short sections become leaf nodes.
Summarize each node bottom‑up – A post‑order traversal generates a concise summary for every leaf using the LLM, then internal nodes build their summaries from the summaries of their children, ending with a root‑level summary of the whole document.
Serialize the index – The tree is saved as a JSON file, allowing the index to be built once and reused for many queries.
Retrieve by tree navigation – Starting at the root, the LLM is shown the summaries of the current node’s children and asked which branch likely contains the answer. The process repeats until a leaf is reached, and the leaf’s original text is returned as context.
Generate the final answer – The retrieved context and the user query are sent to the LLM, which produces the answer.
Repository Structure
Source code is available at https://github.com/vixhal-baraiya/pageindex-rag
pageindex-rag/
pageindex/
__init__.py
node.py
parser.py
indexer.py
retriever.py
storage.py
main.py
document.mdNode Definition (pageindex/node.py)
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class PageNode:
title: str
content: str # raw text for leaf nodes
summary: str # filled by the indexer
depth: int # 0=root, 1=section, 2=subsection
children: list = field(default_factory=list)
parent: Optional["PageNode"] = None
def is_leaf(self) -> bool:
return len(self.children) == 0Document Parsing (pageindex/parser.py)
import json, openai
from .node import PageNode
client = openai.OpenAI()
SUBSECTION_THRESHOLD = 300 # words
def _segment(text: str) -> list:
prompt = f"""Split the following text into logical sections.
Return a JSON object with a \"sections\" key. Each item has:
- \"title\": short title (5 words or less)
- \"content\": the text belonging to this section
Text:
{text[:8000]}"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
max_completion_tokens=3000,
response_format={"type": "json_object"},
)
parsed = json.loads(response.choices[0].message.content)
return parsed.get("sections", [])
def parse_document(text: str) -> PageNode:
root = PageNode(title="root", content="", summary="", depth=0)
for item in _segment(text):
title = item.get("title", "Section")
content = item.get("content", "")
node = PageNode(title=title, content="", summary="", depth=1)
node.parent = root
word_count = len(content.split())
if word_count > SUBSECTION_THRESHOLD:
subsections = _segment(content)
if len(subsections) > 1:
for sub in subsections:
child = PageNode(
title=sub.get("title", "Subsection"),
content=sub.get("content", ""),
summary="",
depth=2,
)
child.parent = node
node.children.append(child)
else:
node.content = content
else:
node.content = content
root.children.append(node)
return rootSummary Construction (pageindex/indexer.py)
import openai
from .node import PageNode
client = openai.OpenAI()
def _summarize(text: str, section_name: str = "") -> str:
hint = f"This is the section titled: {section_name}.
" if section_name else ""
prompt = f"""{hint}Summarize the following in 2-3 sentences. Be specific and factual. Do not add anything not in the text.
{text[:3000]}"""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
max_completion_tokens=150,
)
return response.choices[0].message.content.strip()
def build_summaries(node: PageNode):
for child in node.children:
build_summaries(child)
if node.is_leaf():
if node.content.strip():
node.summary = _summarize(node.content, node.title)
else:
node.summary = "(empty section)"
else:
children_text = "
".join([f"[{c.title}]: {c.summary}" for c in node.children])
node.summary = _summarize(children_text, node.title)Persistence (pageindex/storage.py)
import json
from .node import PageNode
def save(node: PageNode, path: str):
def to_dict(n: PageNode) -> dict:
return {
"title": n.title,
"content": n.content,
"summary": n.summary,
"depth": n.depth,
"children": [to_dict(c) for c in n.children],
}
with open(path, "w") as f:
json.dump(to_dict(node), f, indent=2)
def load(path: str) -> PageNode:
def from_dict(d: dict) -> PageNode:
node = PageNode(
title=d["title"],
content=d["content"],
summary=d["summary"],
depth=d["depth"],
)
for child_dict in d["children"]:
child = from_dict(child_dict)
child.parent = node
node.children.append(child)
return node
with open(path) as f:
return from_dict(json.load(f))Retrieval (pageindex/retriever.py)
import openai
from .node import PageNode
client = openai.OpenAI()
def _pick_child(query: str, node: PageNode) -> PageNode:
options = "
".join([f"{i+1}. [{c.title}]: {c.summary}" for i, c in enumerate(node.children)])
prompt = f"""You are navigating a document tree to find the answer to a question.
Current section: \"{node.title}\"
Question: {query}
Children of this section:
{options}
Which child section most likely contains the answer? Reply with only the number."""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
max_completion_tokens=5,
)
try:
index = int(response.choices[0].message.content.strip()) - 1
return node.children[index]
except (ValueError, IndexError):
return node.children[0]
def retrieve(query: str, root: PageNode) -> str:
node = root
while not node.is_leaf():
if not node.children:
break
node = _pick_child(query, node)
return node.contentMain Orchestration (main.py)
import os
from pageindex.parser import parse_document
from pageindex.indexer import build_summaries
from pageindex.retriever import retrieve
from pageindex import storage
import openai
client = openai.OpenAI()
INDEX_PATH = "index.json"
def build_index(doc_path: str):
print("Parsing document...")
text = open(doc_path).read()
tree = parse_document(text)
print("Building summaries (this makes LLM calls)...")
build_summaries(tree)
print(f"Saving index to {INDEX_PATH}")
storage.save(tree, INDEX_PATH)
return tree
def ask(query: str) -> str:
if not os.path.exists(INDEX_PATH):
raise FileNotFoundError("Index not found. Run build_index() first.")
tree = storage.load(INDEX_PATH)
context = retrieve(query, tree)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Answer using only the context below.
Context:
{context}
Question: {query}"}],
max_completion_tokens=500,
)
return response.choices[0].message.content.strip()
if __name__ == "__main__":
build_index("document.md")
print(ask("Your Question"))Key Takeaways
Hierarchical index : Documents are transformed into a tree that mimics human navigation via a table of contents.
LLM reasoning : The LLM walks the tree level by level, selecting the most promising branch based on node summaries.
Bottom‑up summarization : Summaries are generated from leaves upward, guaranteeing that every node has a concise description.
One‑time build, reuse : The index is serialized to JSON and can be loaded for many queries without rebuilding.
This approach eliminates the need for embedding models and vector databases, making it especially suitable for structured documents that require precise retrieval.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
