MCompassRAG: Using Topic Metadata as a Semantic Compass to Accelerate RAG Retrieval

MCompassRAG introduces a semantic‑compass approach that attaches topic metadata to coarse chunks, eliminating the need for fine‑grained splitting, reranking, or LLM calls during inference, and achieves an average 8.24% information‑efficiency gain and over five‑fold latency reduction across six complex retrieval benchmarks.

PaperAgent
PaperAgent
PaperAgent
MCompassRAG: Using Topic Metadata as a Semantic Compass to Accelerate RAG Retrieval

Topic Metadata as a Semantic Compass

The core idea of MCompassRAG is to keep chunk granularity unchanged while giving each chunk a directional cue: a topic vector that acts as a semantic compass.

Offline Pre‑computation

A topic‑model encoder maps documents and their chunks into the same embedding space as the retriever, producing a topic distribution θ_c∈ℝ^K for each chunk. These distributions are stored in a corpus‑level metadata bank , forming an offline map of the corpus’s topical structure. Because chunks are longer than queries, their topic vectors can be computed once and cached.

Query‑time Topic Selection

Queries are too short to yield reliable topic distributions. MCompassRAG therefore does not use the query’s own distribution; instead, a selection policy compares the query embedding with entries in and picks the most relevant topic distributions. This similarity search is performed entirely within the retriever’s embedding space, requiring no LLM.

Abstract Denoising

The selected topic distributions may contain bias or noise. An abstraction module aggregates them into a refined query‑topic distribution, compressing it into a compact query‑side topic vector – the “semantic compass” that guides the retriever toward the correct semantic direction within a coarse chunk.

Metadata‑Enriched Representation

The query‑side topic vector is concatenated with the original query embedding, forming a metadata‑enriched query representation. Similarly, each chunk’s embedding is concatenated with its cached topic vector. A lightweight MLP classifier scores these enriched representations and returns the top‑k chunks.

LLM Teacher Distillation to a Lightweight Student

Training and inference are strictly separated. During training, an LLM teacher uses query expansion to generate soft relevance labels for each chunk. The student model receives only the original query and must learn to identify relevant chunks from the metadata‑enriched representations. The loss combines binary cross‑entropy with a KL‑divergence knowledge‑distillation term. The student is an extreme multi‑label classifier capable of scoring multiple relevant chunks in a single forward pass, learning to infer the teacher’s judgments from the semantic‑compass signal.

Inference Procedure

Encode the query.

Select relevant topic metadata from .

Abstract the selected topics into a query‑side topic vector.

Score enriched query and chunk representations with the student MLP to obtain top‑k results.

No LLM calls, query expansion, or reranking are required at inference time, which explains the large latency advantage.

Experimental Results

Six Benchmarks Show +8.24% Average IE and >5× Latency Reduction

Benchmark results
Benchmark results

The evaluation covers six complex retrieval benchmarks, including LegalBench‑RAG, Dragonball Finance, and DRBench. Compared with the strongest non‑LLM baseline, MCompassRAG improves average information efficiency (IE) by 8.24% and reduces latency by more than five times. The student model matches the teacher’s performance, indicating that the semantic‑compass mechanism is robust without fine‑tuning. Ablation studies show that the framework is insensitive to the choice of embedding backbone or topic model.

Qualitative Validation – The Compass Really Guides

On LegalBench‑RAG, a query for the definition of “Superior Proposal” is precisely located in the correct chunk, while the baseline is misled by unrelated semantics.

Qualitative retrieval comparison
Qualitative retrieval comparison

t‑SNE visualizations show chunk embeddings clustering by topic, with the query’s topic vector pointing directly to the correct cluster, confirming that the compass aligns retrieval direction with the intended semantic region.

t‑SNE visualization
t‑SNE visualization

Author’s Takeaways

Skip the granularity debate – instead of finer chunks or hierarchical retrieval, enrich coarse chunks with topic metadata.

Separate training and inference – the LLM teacher supervises during training, while the student operates independently at inference, eliminating LLM calls.

Compass ≠ topic model – MCompassRAG is agnostic to the underlying topic model; any vector in the retriever’s embedding space works.

Reuse the metadata bank – offline‑cached chunk topic distributions are read‑only at query time, requiring only selection and abstraction.

The RAG bottleneck of chunk granularity is resolved by providing direction rather than finer slicing.

Paper title: MCompassRAG: Topic Metadata as a Semantic Compass for Paragraph-Level Retrieval
Paper link: https://arxiv.org/abs/2606.18508
GitHub: https://github.com/AmirAbaskohi/MCompassRAG
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

RAGknowledge distillationtopic metadatainformation efficiencyMCompassRAGretrieval benchmarkssemantic compass
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.