Artificial Intelligence 7 min read

Stop RAG, Navigate Enterprise Knowledge Directly with CORPUS2SKILL

The article critiques traditional RAG’s blind spots, introduces CORPUS2SKILL’s offline‑compile, online‑navigate two‑stage architecture that builds a hierarchical topic tree and progressive‑disclosure skill files, and shows through WixQA benchmarks that this approach outperforms dense retrieval and Agentic RAG on F1, factuality and recall while highlighting cost and hierarchy quality trade‑offs.

PaperAgent

Apr 23, 2026

Stop RAG, Navigate Enterprise Knowledge Directly with CORPUS2SKILL

RAG’s Structural Blind Spot

Traditional retrieval‑augmented generation treats the LLM as a passive consumer that only sees the top‑k retrieved passages, lacking a global view of the corpus and unaware of missed information. For complex cross‑topic queries such as “how to convert a sole proprietorship to an LLC”, flat retrieval may return only surface matches like “sole proprietorship” or “LLC” and miss critical documents (e.g., Wix account types) that require contacting support.

Agentic RAG allows the model to issue multiple search requests, but without a “map” each query is “shooting in the dark”. Hierarchical methods such as RAPTOR and GraphRAG enrich candidates with clustering and summarisation, yet they still flatten the tree into a vector index, so the model cannot see the forest.

Core insight: Instead of making the model search the hierarchy, let it browse the hierarchy directly.

CORPUS2SKILL – Compile to Navigate

CORPUS2SKILL adopts a two‑stage “offline compile, online navigate” architecture.

Compilation stage

Embed all documents and iteratively apply bottom‑up K‑Means clustering to build a multi‑level topic tree.

For each cluster, an LLM generates a routing summary describing the topic scope, answer type and key terms.

The tree is materialised as a file system: the root node is a skill directory ( SKILL.md), intermediate nodes are index directories ( INDEX.md), and leaf nodes store document IDs.

This design enables progressive disclosure: an agent initially sees only six skill names and a short description (~200 tokens). When a skill is selected, the full SKILL.md is loaded; further drilling reveals the corresponding INDEX.md and finally the full document via get_document(doc_id). The navigation files consume far fewer tokens than reading the raw documents.

Service stage

The agent is equipped with two tools—code execution (to browse the hierarchical files) and document retrieval (to fetch full text by ID). Because the hierarchy is explicitly visible, the agent can perform directed backtracking (abandoning dead‑end branches) and cross‑branch synthesis (combining evidence from multiple sub‑topics).

Two typical interaction patterns are shown: a four‑step direct path to the target document, and a cross‑branch pattern that merges information from “online courses” and “billing documents” within the same skill to produce a complete answer.

In terms of complexity, the traversal depth is shallow; on the WixQA benchmark (6,221 documents) only about 30 summaries are needed to locate the target among thousands of documents.

Offline Investment, Online Returns

On the WixQA enterprise‑support benchmark, CORPUS2SKILL tops all six evaluated metrics. Token‑based F1 reaches 0.460 (27 % higher than dense retrieval and 19 % higher than Agentic RAG). Factuality is 0.729 and Context Recall 0.652, both substantially above RAPTOR’s 0.616.

Ablation studies reveal interesting trade‑offs:

Hierarchy shape : A narrow, deep tree yields slightly better quality than a default wide tree because finer‑grained topic splits reduce top‑level routing errors; a shallow wide tree suffers a 21 % F1 drop due to overly generic summaries.

Exploration budget : Even with only five interaction rounds, F1 remains 0.453, indicating that the hierarchical structure alone is highly efficient.

Service model : Switching to the cheaper Claude Haiku cuts per‑query cost to $0.088 while preserving or improving context recall, demonstrating that hierarchy quality matters more than model intelligence. However, each query costs $0.17—about 14 × RAPTOR—mainly because navigation files accumulate tokens over multiple turns.

The dominant failure mode (61 % of errors) stems from hard clustering, which forces each document into a single branch, limiting cross‑topic retrieval. Future work will explore incremental compilation and prompt caching to reduce online token costs.

One‑Sentence Summary

Transform the runtime cost of vector‑database queries into a one‑time offline‑compiled hierarchical knowledge map, turning the agent from a “reader of retrieval results” into an “explorer of a knowledge forest”.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Hierarchical Clustering Prompt Engineering RAG benchmark agentic AI knowledge navigation

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.