Why Traditional RAG Breaks the Chain and How SentGraph Fixes It
The article explains why traditional retrieval‑augmented generation fails in multi‑hop scenarios due to overly large chunks, introduces SentGraph’s sentence‑level graph that trims retrieval units and encodes logical relations, details offline construction and online inference steps, and shows experimental gains and remaining limitations.
Why Traditional RAG Breaks the Reasoning Chain
In single‑hop retrieval, documents are split into ~200‑word chunks and a dense vector search returns a small set of chunks that the LLM can answer directly. This works because each chunk is short and mostly relevant. In multi‑hop QA, the answer must be assembled from 2‑4 documents. Returning whole paragraphs introduces a large amount of irrelevant text (≈60% noise), which drowns the key facts and causes the reasoning chain to collapse, leading to hallucinations.
Key insight: the failure is not inaccurate retrieval but overly‑large retrieval units and tangled logical relations.
Traditional chunk graph SentGraph sentence graph
[Paragraph1] —similar→ [Paragraph2] [S1] —cause→ [S2] —contrast→ [S3]
↓ contains many noisy sentences ↓ sentence‑by‑sentence relevance
Context explosion Clean reasoning chainSentGraph’s Slimming Strategy
SentGraph reduces the retrieval granularity from paragraph to single sentence and organizes all sentences into a three‑layer hierarchical graph that makes logical relations explicit.
Topic layer — cross‑document bridge
↑
Core sentence layer — core facts
↑
Support sentence layer — background, cause, exampleOffline Graph Construction (Three Steps)
Sentence Splitting – Documents are segmented into individual sentences using a standard natural‑language‑inference (NLI) model. This eliminates redundant chunk overlap.
Relation Identification – For each sentence pair, a compact Rhetorical Structure Theory (RST) classifier predicts one of 12 rhetorical relations (e.g., cause, contrast, example). This makes discourse cues such as “because”, “but”, “for example” explicit.
Cross‑Document Bridging – An LLM generates entity‑relation‑entity triples for entities that appear in different documents. The triples are inserted as edges in the Topic layer, thereby linking evidence across documents.
Online Inference (Three Steps)
Anchor Selection – A dense retriever computes the cosine similarity between the question vector and every sentence vector, returning the top‑K sentences as candidate anchors.
Anchor Refinement – The LLM acts as a judge: it discards anchors that are irrelevant, and decides whether the retained anchors already provide sufficient evidence. If they do, the system proceeds to a direct answer; otherwise it triggers path expansion.
Path Expansion – Starting from the refined anchors, a breadth‑first search follows sentence‑to‑sentence (N‑N) and sentence‑to‑support (N‑S) edges in the graph, pulling in “cause”, “contrast”, and “background” sentences until a complete evidence chain is formed.
Experimental Results
SentGraph was evaluated on four multi‑hop QA benchmarks. All scores are Exact Match (EM) unless otherwise noted.
HotpotQA: baseline 44.0 EM → SentGraph 48.8 EM (↑ 4.8)
2Wiki: baseline 36.8 EM → SentGraph 42.0 EM (↑ 5.2)
MuSiQue: baseline 21.2 EM → SentGraph 26.8 EM (↑ 5.6)
MultiHopRAG: baseline 63.4 Acc → SentGraph 65.6 Acc (↑ 2.2)
Additional observations:
Token efficiency: Compared with the strongest chunk‑level graph baseline (KGP), SentGraph reduces input tokens by ~30 % and output tokens by ~60 %.
Model size: A 7 B Qwen model equipped with SentGraph outperforms a 14 B model that uses traditional chunking, demonstrating that the approach benefits smaller LLMs.
Limitations and Future Work
Graph construction depends on LLM‑generated relations; larger LLMs produce more accurate edges, while smaller models may introduce noise.
The current set of 12 rhetorical relations is tuned for multi‑hop QA. Extending the method to other tasks will require redesigning the relation taxonomy.
Offline graph building is computationally intensive. Incremental or on‑the‑fly graph updates are a promising direction.
By shrinking the retrieval unit to the sentence level and explicitly encoding rhetorical arrows, RAG can achieve “less noise, more evidence, uninterrupted chains” in multi‑hop question answering. SentGraph reaches state‑of‑the‑art performance with only 30 % of the token budget, illustrating the potential of a “graph + sentence” paradigm.
SentGraph: Hierarchical Sentence Graph for Multi‑hop Retrieval‑Augmented Question Answering
https://arxiv.org/pdf/2601.03014How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
