BookRAG: A Tree‑Graph Fusion RAG Framework for Hierarchical Documents
BookRAG introduces a tree‑graph fused Retrieval‑Augmented Generation framework that builds a native document index combining hierarchical layout trees with fine‑grained knowledge graphs, and employs an Information‑Foraging‑Theory‑inspired agent to dynamically navigate queries across complex, multi‑section documents.
Limitations of Existing RAG Methods
Current RAG systems, whether text‑first or layout‑first, often fail on hierarchical documents because they either lose structural context or cannot flexibly associate content blocks, and their rigid pipelines cannot handle the wide variance between simple definition lookups and multi‑chapter comparative analyses.
BookRAG: A Tree + Graph + Link + Agent
BookRAG is a RAG framework designed specifically for hierarchical documents. Its core idea is to construct a native document index called BookIndex that integrates the layout‑derived tree with a fine‑grained entity knowledge graph via a graph‑tree mapping, and to use an agent inspired by Information Foraging Theory to retrieve information adaptively.
Building BookIndex
BookIndex is built in two phases. First, layout parsing (implemented with MinerU) splits a PDF into independent blocks, each annotated with type, font size, position, and other layout metadata. A language model then validates suspected headings and assigns hierarchical levels, linking blocks into a tree that forms the structural backbone of BookIndex.
Second, each node undergoes entity and relation extraction. Text blocks are processed by LLMs, image blocks by multimodal models, and tables/formulas by specialized logic. Extracted entities are linked back to their source nodes via "ContainedIn" relations. A gradient‑based entity resolution step merges local sub‑graphs into a global knowledge graph by detecting sharp drops in similarity scores and either directly merging high‑confidence candidates or invoking an LLM to select a canonical entity.
The GT‑Link component creates bidirectional bridges between the tree and graph, mapping entities to their originating tree nodes, thus tightly coupling structure and semantics.
Gradient‑Based Entity Resolution
Instead of exhaustive pairwise comparison, BookRAG incrementally looks up each new entity: it retrieves candidate entities from a vector database, ranks them with a scoring model, and checks for a sudden score decline. If a clear drop is found, a high‑confidence candidate set is isolated; a single candidate is merged directly, while multiple candidates are resolved by an LLM before merging. This approach avoids quadratic cost while keeping the graph compact, e.g., merging "LLM" and "Large Language Model" into one node.
Agent‑Driven Adaptive Retrieval
The agent classifies queries (single‑hop, multi‑hop, global aggregation) and plans a dynamic sequence of modular operators (Formulator, Selector, Reasoner, Synthesizer). For a single‑hop factual query, the agent uses Extract to identify entities, Select_by_Entity to prune the tree from 134 to 24 nodes, then applies Graph_Reasoning and Text_Reasoning to assign importance scores, finally selecting eight high‑confidence nodes with Skyline_Ranker to generate the answer.
For a global aggregation query counting images on pages 1‑10, the agent runs Filter_Range and Filter_Modal to isolate image blocks, then uses Map and Reduce to perform the COUNT operation.
For multi‑hop comparative queries, the agent first invokes Decompose to split the problem into sub‑questions, retrieves answers for each sub‑question, and finally synthesizes a combined response.
Evaluation
Experiments on benchmarks such as MMLongBench and Qasper demonstrate BookRAG's superior answer accuracy, higher retrieval coverage, and lower latency compared with traditional text‑first and layout‑first pipelines. Full evaluation data are available in the original paper.
Conclusion and Future Directions
BookRAG provides a validated design for complex long‑document QA, unifying hierarchical trees, knowledge graphs, and agent‑driven navigation. A current limitation is that entity resolution operates only within a single document, which poses challenges for enterprise scenarios with thousands of documents. Future work includes extending BookIndex to a cross‑document knowledge layer and exploring learnable, reinforcement‑learning‑based operator planning to further optimize efficiency and expressiveness.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DeepHub IMBA
A must‑follow public account sharing practical AI insights. Follow now. internet + machine learning + big data + architecture = IMBA
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
