Making LLM Answers Trustworthy: Citation Attribution and Hallucination Detection
This article explains why simple prompt‑based citation is insufficient for Retrieval‑Augmented Generation, introduces a sentence‑level attribution pipeline, combines semantic similarity with NLI verification, and presents practical hallucination detection and structured JSON output to ensure answer reliability.
1. Why Prompt‑based citation is insufficient?
Asking the LLM to insert source tags in the prompt is easy but unreliable: about 15% of sentences miss a citation, 8% receive the wrong document number, and the model can fabricate statements while still attaching a reference, leading to undetectable hallucinations.
2. Post‑processing attribution: sentence‑level source finding
The answer is first split into individual sentences. For each sentence we compute semantic similarity with every retrieved document and select the document with the highest score. If the similarity exceeds a threshold (e.g., 0.75) we record the document ID, section path, and confidence; otherwise we mark the sentence as unverified.
def attribute_answer(answer, retrieved_docs):
sentences = split_sentences(answer)
attributions = []
for sent in sentences:
best_doc = None
best_score = 0
for doc in retrieved_docs:
score = compute_similarity(sent, doc.content)
if score > best_score:
best_score = score
best_doc = doc
if best_score > 0.75:
attributions.append({
'sentence': sent,
'source_doc_id': best_doc.id,
'source_section': best_doc.metadata['section_path'],
'confidence': best_score
})
else:
attributions.append({
'sentence': sent,
'source_doc_id': None,
'confidence': 0,
'warning': 'unverified_claim'
})
return attributions3. NLI verification: beyond similarity
An NLI (Natural Language Inference) model determines whether a document actually entails a sentence. The model returns probabilities for entailment, contradiction, and neutral. If entailment > 0.7 we accept the claim; if contradiction > 0.5 we flag it as a hallucination; otherwise we treat it as not found.
def verify_entailment(sentence, document):
"""Use an NLI model to check whether *document* can entail *sentence*.
Returns: 'supported', 'contradicted', or 'not_found'."""
nli_input = {'premise': document, 'hypothesis': sentence}
result = nli_model.predict(nli_input)
if result['entailment'] > 0.7:
return 'supported'
elif result['contradiction'] > 0.5:
return 'contradicted'
else:
return 'not_found'4. Combined similarity + NLI pipeline
We first filter candidate documents by similarity (>0.6), then run NLI verification on each candidate. The best supporting document is kept; if none support the sentence we mark it as unverified.
def attribute_with_nli(answer, retrieved_docs):
sentences = split_sentences(answer)
attributions = []
for sent in sentences:
candidates = []
for doc in retrieved_docs:
sim_score = compute_similarity(sent, doc.content)
if sim_score > 0.6:
entailment = verify_entailment(sent, doc.content)
if entailment == 'supported':
candidates.append({'doc': doc, 'score': sim_score})
if candidates:
best = max(candidates, key=lambda x: x['score'])
attributions.append({
'sentence': sent,
'source': best['doc'].metadata,
'verified': True
})
else:
attributions.append({
'sentence': sent,
'source': None,
'verified': False,
'warning': 'unverified_claim'
})
return attributions5. Hallucination detection and handling
After attribution we identify "unverified claims". Sentences containing factual cues (numbers, dates, strong assertions) are considered high‑risk and removed. Low‑risk inferential sentences are kept but flagged for user awareness.
def contains_factual_claim(sentence):
"""Detect factual statements by looking for numbers, clause numbers, strong verbs, units, etc."""
patterns = [r'\d+', r'第\d+条', r'必须|应当|不得|禁止', r'万元|%|天|年']
return any(re.search(p, sentence) for p in patterns)
def handle_hallucination(answer, attributions):
for attr in attributions:
if attr['verified']:
continue
sent = attr['sentence']
if contains_factual_claim(sent):
answer = answer.replace(sent, '')
else:
answer = answer.replace(sent, f"{sent} ⚠️[未在文档中找到直接依据]")
return answer6. Structured citation output
The final response sent to the front‑end is a JSON object containing the answer text, a list of citations with document title, section path, page number, original text, and confidence scores, plus an array of any unverified statements.
{
"answer": "核辐射不在保障范围内。根据条款,责任免除包括核辐射等。",
"citations": [
{
"sentence": "核辐射不在保障范围内。",
"source": {
"doc_title": "XX意外险条款",
"section_path": "第3条 责任免除 > (2)",
"page_num": 5,
"original_text": "责任免除:(2)核辐射、核爆炸……"
},
"confidence": 0.92
}
],
"unverified": []
}7. How to answer interview questions on citation tracing
Start with a 15‑second rationale: RAG answers must be traceable for compliance and user trust. Then outline three layers (40 s): prompt‑level tags (high omission/mislabel rates), post‑processing similarity attribution, and NLI verification (overall accuracy 94%). Finally, describe hallucination handling (20 s) and show the quantitative improvement (15 s).
In high‑risk domains such as insurance, finance, law, or healthcare, every answer without a verifiable source is a potential compliance risk; the combined pipeline ensures both accuracy and transparency.
Wu Shixiong's Large Model Academy
We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
