How Long‑Tail Knowledge Boosts Retrieval‑Augmented Large Language Models

The paper introduces a method that classifies user queries into ordinary and long‑tail types, applying retrieval‑augmented generation only to long‑tail queries, which improves large language model efficiency and accuracy by leveraging specialized knowledge detection metrics and an extended RAG pipeline.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Long‑Tail Knowledge Boosts Retrieval‑Augmented Large Language Models

Alibaba Cloud AI platform PAI, together with Alibaba Group Security's content‑security algorithm team and Prof. He Xiaofeng's group from East China Normal University, presented a paper at ACL2024 titled On the Role of Long‑tail Knowledge in Retrieval Augmented Large Language Models . The work proposes distinguishing queries as ordinary answerable or long‑tail, and enhancing only the latter with retrieved documents, thereby increasing LLM response efficiency.

Background

Although large language models (LLMs) achieve impressive performance, they still struggle with hallucinations, outdated information, and opaque reasoning. Retrieval‑augmented generation (RAG) mitigates these issues by integrating external knowledge bases, improving accuracy and timeliness. Existing RAG approaches treat all retrieved information uniformly, overlooking the specific knowledge type required for each query.

Long‑tail knowledge impact diagram
Long‑tail knowledge impact diagram

Algorithm Overview

Long‑tail Detection Metric (ECE)

Traditional methods rely on text frequency to label instances as long‑tail, which is impractical for unseen queries. Expected Calibration Error (ECE) measures the mismatch between predicted probabilities and observed frequencies, providing a new perspective for long‑tail detection. ECE is computed by binning confidence scores and comparing each bin’s accuracy and confidence.

ECE formula
ECE formula

Metric‑Based Long‑tail Detection

Accuracy is measured by METEOR, which evaluates the similarity between generated text and reference answers.

Confidence is derived from the LLM’s average token probability.

Additional factors improve detection:

Average word frequency, a basic indicator of long‑tail text.

Dot product between the gradient of a specific sample and the average gradient of the whole dataset, reflecting the divergence of long‑tail instances.

GECE Metric

Extending ECE, the Generalized ECE (GECE) incorporates METEOR scores and token probabilities:

GECE formula
GECE formula

In GECE, a smaller average word‑frequency factor (α) for long‑tail samples yields a larger reciprocal, resulting in higher GECE values that indicate better long‑tail identification. Example values: 34.6 for a common NQ query and 112.7 for a specialized query about “who played Raoul in The Phantom of the Opera”.

RAG Pipeline Extension

The proposed extension retrieves documents only for long‑tail queries using a dense retriever (e.g., Wikipedia). For long‑tail instances, the retrieved documents are concatenated with the query before feeding the LLM; for ordinary queries, the LLM receives the query alone.

Algorithm Evaluation

Multiple knowledge‑intensive public datasets were used to assess the method. Results show that adding the GECE module improves performance across datasets and reduces inference time by filtering out generic queries.

Evaluation results chart 1
Evaluation results chart 1
Evaluation results chart 2
Evaluation results chart 2

References

Akari Asai et al., “Self‑RAG: Learning to Retrieve, Generate, and Critique through Self‑Reflection,” 2023.

Satanjeev Banerjee and Alon Lavie, “METEOR: an automatic metric for MT evaluation,” ACL, 2005.

Zhangyin Feng et al., “Retrieval‑generation synergy augmented large language models,” 2023.

Gautier Izacard et al., “Atlas: Few‑shot learning with retrieval‑augmented language models,” JMLR, 2023.

Nikhil Kandpal et al., “Large language models struggle to learn long‑tail knowledge,” ICML, 2023.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Retrieval Augmented GenerationAI researchECE metriclong-tail knowledge
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.