Making LLM Answers Trustworthy: Citation Attribution and Hallucination Detection

This article explains why simple prompt‑based citation is insufficient for Retrieval‑Augmented Generation, introduces a sentence‑level attribution pipeline, combines semantic similarity with NLI verification, and presents practical hallucination detection and structured JSON output to ensure answer reliability.

Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Making LLM Answers Trustworthy: Citation Attribution and Hallucination Detection

1. Why Prompt‑based citation is insufficient?

Asking the LLM to insert source tags in the prompt is easy but unreliable: about 15% of sentences miss a citation, 8% receive the wrong document number, and the model can fabricate statements while still attaching a reference, leading to undetectable hallucinations.

2. Post‑processing attribution: sentence‑level source finding

The answer is first split into individual sentences. For each sentence we compute semantic similarity with every retrieved document and select the document with the highest score. If the similarity exceeds a threshold (e.g., 0.75) we record the document ID, section path, and confidence; otherwise we mark the sentence as unverified.

def attribute_answer(answer, retrieved_docs):
    sentences = split_sentences(answer)
    attributions = []
    for sent in sentences:
        best_doc = None
        best_score = 0
        for doc in retrieved_docs:
            score = compute_similarity(sent, doc.content)
            if score > best_score:
                best_score = score
                best_doc = doc
        if best_score > 0.75:
            attributions.append({
                'sentence': sent,
                'source_doc_id': best_doc.id,
                'source_section': best_doc.metadata['section_path'],
                'confidence': best_score
            })
        else:
            attributions.append({
                'sentence': sent,
                'source_doc_id': None,
                'confidence': 0,
                'warning': 'unverified_claim'
            })
    return attributions

3. NLI verification: beyond similarity

An NLI (Natural Language Inference) model determines whether a document actually entails a sentence. The model returns probabilities for entailment, contradiction, and neutral. If entailment > 0.7 we accept the claim; if contradiction > 0.5 we flag it as a hallucination; otherwise we treat it as not found.

def verify_entailment(sentence, document):
    """Use an NLI model to check whether *document* can entail *sentence*.
    Returns: 'supported', 'contradicted', or 'not_found'."""
    nli_input = {'premise': document, 'hypothesis': sentence}
    result = nli_model.predict(nli_input)
    if result['entailment'] > 0.7:
        return 'supported'
    elif result['contradiction'] > 0.5:
        return 'contradicted'
    else:
        return 'not_found'

4. Combined similarity + NLI pipeline

We first filter candidate documents by similarity (>0.6), then run NLI verification on each candidate. The best supporting document is kept; if none support the sentence we mark it as unverified.

def attribute_with_nli(answer, retrieved_docs):
    sentences = split_sentences(answer)
    attributions = []
    for sent in sentences:
        candidates = []
        for doc in retrieved_docs:
            sim_score = compute_similarity(sent, doc.content)
            if sim_score > 0.6:
                entailment = verify_entailment(sent, doc.content)
                if entailment == 'supported':
                    candidates.append({'doc': doc, 'score': sim_score})
        if candidates:
            best = max(candidates, key=lambda x: x['score'])
            attributions.append({
                'sentence': sent,
                'source': best['doc'].metadata,
                'verified': True
            })
        else:
            attributions.append({
                'sentence': sent,
                'source': None,
                'verified': False,
                'warning': 'unverified_claim'
            })
    return attributions

5. Hallucination detection and handling

After attribution we identify "unverified claims". Sentences containing factual cues (numbers, dates, strong assertions) are considered high‑risk and removed. Low‑risk inferential sentences are kept but flagged for user awareness.

def contains_factual_claim(sentence):
    """Detect factual statements by looking for numbers, clause numbers, strong verbs, units, etc."""
    patterns = [r'\d+', r'第\d+条', r'必须|应当|不得|禁止', r'万元|%|天|年']
    return any(re.search(p, sentence) for p in patterns)


def handle_hallucination(answer, attributions):
    for attr in attributions:
        if attr['verified']:
            continue
        sent = attr['sentence']
        if contains_factual_claim(sent):
            answer = answer.replace(sent, '')
        else:
            answer = answer.replace(sent, f"{sent} ⚠️[未在文档中找到直接依据]")
    return answer

6. Structured citation output

The final response sent to the front‑end is a JSON object containing the answer text, a list of citations with document title, section path, page number, original text, and confidence scores, plus an array of any unverified statements.

{
    "answer": "核辐射不在保障范围内。根据条款,责任免除包括核辐射等。",
    "citations": [
        {
            "sentence": "核辐射不在保障范围内。",
            "source": {
                "doc_title": "XX意外险条款",
                "section_path": "第3条 责任免除 > (2)",
                "page_num": 5,
                "original_text": "责任免除:(2)核辐射、核爆炸……"
            },
            "confidence": 0.92
        }
    ],
    "unverified": []
}

7. How to answer interview questions on citation tracing

Start with a 15‑second rationale: RAG answers must be traceable for compliance and user trust. Then outline three layers (40 s): prompt‑level tags (high omission/mislabel rates), post‑processing similarity attribution, and NLI verification (overall accuracy 94%). Finally, describe hallucination handling (20 s) and show the quantitative improvement (15 s).

In high‑risk domains such as insurance, finance, law, or healthcare, every answer without a verifiable source is a potential compliance risk; the combined pipeline ensures both accuracy and transparency.

prompt engineeringRAGHallucination DetectionLLM reliabilitycitation attributionNLI
Wu Shixiong's Large Model Academy
Written by

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.