Artificial Intelligence 13 min read

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning – Best Long Paper at EMNLP 2023

At EMNLP 2023, the joint WeChat AI and Peking University paper 'Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning' won the Best Long Paper award, revealing that label tokens act as anchors driving information aggregation in shallow layers and prediction flow in deep layers, and proposing methods to improve and diagnose in‑context learning.

DataFunTalk

Dec 21, 2023

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning – Best Long Paper at EMNLP 2023

The paper "Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning" was selected as the Best Long Paper at EMNLP 2023, marking the first domestic award of this kind. The work, a collaboration between WeChat AI and Peking University, investigates the mechanisms of in‑context learning (ICL) in large language models.

Background: ICL is a popular few‑shot learning paradigm that guides LLMs to perform tasks by providing demonstration examples without updating model parameters. Prior studies have examined example formatting, connections to k‑nearest neighbors, and gradient descent, but the role of label tokens remained unclear.

Key Insight: From an information‑flow perspective, the authors hypothesize that label words serve as "anchors". Experiments reveal two distinct flows: an information aggregation flow in shallow layers that gathers example information at label positions, and a label prediction flow in deep layers that uses the aggregated information to make predictions.

Experiments:

Experiment 1 – Saliency Score for Information Flow: Measured saliency of attention weights between token pairs, showing dominance of the aggregation flow in shallow layers and the prediction flow in deep layers.

Experiment 2 – Shallow Layers: Information Aggregation: Blocked attention to label tokens in shallow layers, which caused the largest performance drop, confirming the importance of the aggregation flow.

Experiment 3 – Deep Layers: Information Extraction: Correlated attention from label tokens to target positions with model predictions using AUC‑ROC, demonstrating strong alignment in deep layers.

Applications derived from the analysis:

Anchor Re‑weighting: Adjusted attention weights from label tokens to targets, yielding significant performance gains.

Anchor‑Only Context Compression: Replaced full example tokens with the hidden state of label tokens, drastically reducing token count while preserving accuracy.

Anchor Distances for Error Diagnosis: Used distances between label key vectors to predict class confusion, achieving good alignment with actual error patterns.

Conclusion: The study provides a comprehensive view of how LLMs perform ICL, confirming the anchor role of label tokens and offering practical methods to enhance accuracy, speed, and error analysis. Limitations include focus on classification tasks and the need to explore generative and chain‑of‑thought scenarios.

Additional Context: The award highlights the strength of the WeChat AI team, which also contributed eight papers to EMNLP 2023, covering topics such as few‑shot relation extraction, long‑turn dialogue pre‑training, and multimodal summarization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models AI research Information Flow In-Context Learning anchor tokens

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.