Building a Vector‑Free RAG System with Hierarchical Page Indexing

This guide explains how to create a retrieval‑augmented generation (RAG) system that avoids embeddings by converting documents into a hierarchical tree, using an LLM to navigate, summarize, and retrieve answers, complete with a full Python implementation and a GitHub repository.

Java One
Java One
Java One
Building a Vector‑Free RAG System with Hierarchical Page Indexing

Overview

This article describes a vector‑free, inference‑based Retrieval‑Augmented Generation (RAG) system that builds a hierarchical page index. The document is transformed into a tree of sections and subsections, allowing a Large Language Model (LLM) to navigate the tree level‑by‑level to locate the most relevant leaf node and use its raw text as context for answer generation. No embeddings or similarity search are required.

Overall Plan

Parse the document into a hierarchical tree – The document is sent to an LLM, which splits it into top‑level sections. Sections longer than a configurable threshold (default 300 words) are recursively split into subsections, producing a multi‑level tree where short sections become leaf nodes.

Summarize each node bottom‑up – A post‑order traversal generates a concise summary for every leaf using the LLM, then internal nodes build their summaries from the summaries of their children, ending with a root‑level summary of the whole document.

Serialize the index – The tree is saved as a JSON file, allowing the index to be built once and reused for many queries.

Retrieve by tree navigation – Starting at the root, the LLM is shown the summaries of the current node’s children and asked which branch likely contains the answer. The process repeats until a leaf is reached, and the leaf’s original text is returned as context.

Generate the final answer – The retrieved context and the user query are sent to the LLM, which produces the answer.

Repository Structure

Source code is available at https://github.com/vixhal-baraiya/pageindex-rag

pageindex-rag/
    pageindex/
        __init__.py
        node.py
        parser.py
        indexer.py
        retriever.py
        storage.py
    main.py
    document.md

Node Definition (pageindex/node.py)

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class PageNode:
    title: str
    content: str  # raw text for leaf nodes
    summary: str  # filled by the indexer
    depth: int    # 0=root, 1=section, 2=subsection
    children: list = field(default_factory=list)
    parent: Optional["PageNode"] = None

    def is_leaf(self) -> bool:
        return len(self.children) == 0

Document Parsing (pageindex/parser.py)

import json, openai
from .node import PageNode

client = openai.OpenAI()
SUBSECTION_THRESHOLD = 300  # words

def _segment(text: str) -> list:
    prompt = f"""Split the following text into logical sections.
Return a JSON object with a \"sections\" key. Each item has:
- \"title\": short title (5 words or less)
- \"content\": the text belonging to this section

Text:
{text[:8000]}"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        max_completion_tokens=3000,
        response_format={"type": "json_object"},
    )
    parsed = json.loads(response.choices[0].message.content)
    return parsed.get("sections", [])

def parse_document(text: str) -> PageNode:
    root = PageNode(title="root", content="", summary="", depth=0)
    for item in _segment(text):
        title = item.get("title", "Section")
        content = item.get("content", "")
        node = PageNode(title=title, content="", summary="", depth=1)
        node.parent = root
        word_count = len(content.split())
        if word_count > SUBSECTION_THRESHOLD:
            subsections = _segment(content)
            if len(subsections) > 1:
                for sub in subsections:
                    child = PageNode(
                        title=sub.get("title", "Subsection"),
                        content=sub.get("content", ""),
                        summary="",
                        depth=2,
                    )
                    child.parent = node
                    node.children.append(child)
            else:
                node.content = content
        else:
            node.content = content
        root.children.append(node)
    return root

Summary Construction (pageindex/indexer.py)

import openai
from .node import PageNode

client = openai.OpenAI()

def _summarize(text: str, section_name: str = "") -> str:
    hint = f"This is the section titled: {section_name}.
" if section_name else ""
    prompt = f"""{hint}Summarize the following in 2-3 sentences. Be specific and factual. Do not add anything not in the text.

{text[:3000]}"""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        max_completion_tokens=150,
    )
    return response.choices[0].message.content.strip()

def build_summaries(node: PageNode):
    for child in node.children:
        build_summaries(child)
    if node.is_leaf():
        if node.content.strip():
            node.summary = _summarize(node.content, node.title)
        else:
            node.summary = "(empty section)"
    else:
        children_text = "

".join([f"[{c.title}]: {c.summary}" for c in node.children])
        node.summary = _summarize(children_text, node.title)

Persistence (pageindex/storage.py)

import json
from .node import PageNode

def save(node: PageNode, path: str):
    def to_dict(n: PageNode) -> dict:
        return {
            "title": n.title,
            "content": n.content,
            "summary": n.summary,
            "depth": n.depth,
            "children": [to_dict(c) for c in n.children],
        }
    with open(path, "w") as f:
        json.dump(to_dict(node), f, indent=2)

def load(path: str) -> PageNode:
    def from_dict(d: dict) -> PageNode:
        node = PageNode(
            title=d["title"],
            content=d["content"],
            summary=d["summary"],
            depth=d["depth"],
        )
        for child_dict in d["children"]:
            child = from_dict(child_dict)
            child.parent = node
            node.children.append(child)
        return node
    with open(path) as f:
        return from_dict(json.load(f))

Retrieval (pageindex/retriever.py)

import openai
from .node import PageNode

client = openai.OpenAI()

def _pick_child(query: str, node: PageNode) -> PageNode:
    options = "
".join([f"{i+1}. [{c.title}]: {c.summary}" for i, c in enumerate(node.children)])
    prompt = f"""You are navigating a document tree to find the answer to a question.

Current section: \"{node.title}\"
Question: {query}

Children of this section:
{options}

Which child section most likely contains the answer? Reply with only the number."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        max_completion_tokens=5,
    )
    try:
        index = int(response.choices[0].message.content.strip()) - 1
        return node.children[index]
    except (ValueError, IndexError):
        return node.children[0]

def retrieve(query: str, root: PageNode) -> str:
    node = root
    while not node.is_leaf():
        if not node.children:
            break
        node = _pick_child(query, node)
    return node.content

Main Orchestration (main.py)

import os
from pageindex.parser import parse_document
from pageindex.indexer import build_summaries
from pageindex.retriever import retrieve
from pageindex import storage
import openai

client = openai.OpenAI()
INDEX_PATH = "index.json"

def build_index(doc_path: str):
    print("Parsing document...")
    text = open(doc_path).read()
    tree = parse_document(text)
    print("Building summaries (this makes LLM calls)...")
    build_summaries(tree)
    print(f"Saving index to {INDEX_PATH}")
    storage.save(tree, INDEX_PATH)
    return tree

def ask(query: str) -> str:
    if not os.path.exists(INDEX_PATH):
        raise FileNotFoundError("Index not found. Run build_index() first.")
    tree = storage.load(INDEX_PATH)
    context = retrieve(query, tree)
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Answer using only the context below.

Context:
{context}

Question: {query}"}],
        max_completion_tokens=500,
    )
    return response.choices[0].message.content.strip()

if __name__ == "__main__":
    build_index("document.md")
    print(ask("Your Question"))

Key Takeaways

Hierarchical index : Documents are transformed into a tree that mimics human navigation via a table of contents.

LLM reasoning : The LLM walks the tree level by level, selecting the most promising branch based on node summaries.

Bottom‑up summarization : Summaries are generated from leaves upward, guaranteeing that every node has a concise description.

One‑time build, reuse : The index is serialized to JSON and can be loaded for many queries without rebuilding.

This approach eliminates the need for embedding models and vector databases, making it especially suitable for structured documents that require precise retrieval.

PythonLLMRAGretrievalsummarizationHierarchical Indexing
Java One
Written by

Java One

Sharing common backend development knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.