How Long‑Tail Knowledge Boosts Retrieval‑Augmented Large Language Models
The paper introduces a method that classifies user queries into ordinary and long‑tail types, applying retrieval‑augmented generation only to long‑tail queries, which improves large language model efficiency and accuracy by leveraging specialized knowledge detection metrics and an extended RAG pipeline.
Alibaba Cloud AI platform PAI, together with Alibaba Group Security's content‑security algorithm team and Prof. He Xiaofeng's group from East China Normal University, presented a paper at ACL2024 titled On the Role of Long‑tail Knowledge in Retrieval Augmented Large Language Models . The work proposes distinguishing queries as ordinary answerable or long‑tail, and enhancing only the latter with retrieved documents, thereby increasing LLM response efficiency.
Background
Although large language models (LLMs) achieve impressive performance, they still struggle with hallucinations, outdated information, and opaque reasoning. Retrieval‑augmented generation (RAG) mitigates these issues by integrating external knowledge bases, improving accuracy and timeliness. Existing RAG approaches treat all retrieved information uniformly, overlooking the specific knowledge type required for each query.
Algorithm Overview
Long‑tail Detection Metric (ECE)
Traditional methods rely on text frequency to label instances as long‑tail, which is impractical for unseen queries. Expected Calibration Error (ECE) measures the mismatch between predicted probabilities and observed frequencies, providing a new perspective for long‑tail detection. ECE is computed by binning confidence scores and comparing each bin’s accuracy and confidence.
Metric‑Based Long‑tail Detection
Accuracy is measured by METEOR, which evaluates the similarity between generated text and reference answers.
Confidence is derived from the LLM’s average token probability.
Additional factors improve detection:
Average word frequency, a basic indicator of long‑tail text.
Dot product between the gradient of a specific sample and the average gradient of the whole dataset, reflecting the divergence of long‑tail instances.
GECE Metric
Extending ECE, the Generalized ECE (GECE) incorporates METEOR scores and token probabilities:
In GECE, a smaller average word‑frequency factor (α) for long‑tail samples yields a larger reciprocal, resulting in higher GECE values that indicate better long‑tail identification. Example values: 34.6 for a common NQ query and 112.7 for a specialized query about “who played Raoul in The Phantom of the Opera”.
RAG Pipeline Extension
The proposed extension retrieves documents only for long‑tail queries using a dense retriever (e.g., Wikipedia). For long‑tail instances, the retrieved documents are concatenated with the query before feeding the LLM; for ordinary queries, the LLM receives the query alone.
Algorithm Evaluation
Multiple knowledge‑intensive public datasets were used to assess the method. Results show that adding the GECE module improves performance across datasets and reduces inference time by filtering out generic queries.
References
Akari Asai et al., “Self‑RAG: Learning to Retrieve, Generate, and Critique through Self‑Reflection,” 2023.
Satanjeev Banerjee and Alon Lavie, “METEOR: an automatic metric for MT evaluation,” ACL, 2005.
Zhangyin Feng et al., “Retrieval‑generation synergy augmented large language models,” 2023.
Gautier Izacard et al., “Atlas: Few‑shot learning with retrieval‑augmented language models,” JMLR, 2023.
Nikhil Kandpal et al., “Large language models struggle to learn long‑tail knowledge,” ICML, 2023.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
