Artificial Intelligence 10 min read

How Pos2Distill Eliminates Positional Bias in Large Language Models

This article introduces Pos2Distill, a novel knowledge‑distillation framework that transfers capabilities from advantageous to disadvantaged positions in large language models, effectively mitigating positional bias and improving performance on long‑text retrieval and in‑context reasoning tasks.

Amap Tech

Sep 2, 2025

How Pos2Distill Eliminates Positional Bias in Large Language Models

EMNLP 2025

EMNLP is a top international conference on computational linguistics and natural language processing. The conference received 8,174 valid submissions, accepting 22.16% as oral papers; two papers from the Gaode team were accepted.

Introduction

Language models (LMs) excel at dialogue, reasoning, and pattern induction, yet they suffer from a "positional bias"—a tendency to over‑focus on specific input positions—hindering complex reasoning, long‑text understanding, and fair evaluation.

Project Overview

To address this, the Gaode Machine Learning team proposes Pos2Distill , a "position‑to‑position" knowledge‑distillation framework that transfers the strong abilities learned at advantageous positions to weaker ones, thereby reducing the performance gap caused by positional bias.

Two specialized variants are designed:

Pos2Distill‑R1 for retrieval tasks, employing activation of trivial positions and anchoring of advantageous positions.

Pos2Distill‑R2 for in‑context reasoning, distilling high‑quality chain‑of‑thought responses from advantageous positions to correct reasoning trajectories at disadvantaged positions.

Figure 1: Pos2Distill overall process

Research Background

In information‑rich scenarios such as retrieval‑augmented generation, long‑context reasoning, and using LLMs as judges, positional bias becomes a major obstacle because critical information may be scattered across the input, leading to missed or mis‑integrated content.

Previous work either modifies model architectures or relies on costly data‑intensive training, but both approaches leave a substantial gap between advantageous and disadvantageous positions.

Key Contributions

Identifies "golden signals" within positional bias that can be leveraged for mitigation.

Introduces a novel position‑to‑position knowledge‑distillation framework (Pos2Distill) that corrects responses from disadvantaged positions.

Designs two task‑specific systems (R1 and R2) that achieve strong cross‑task generalization.

Experimental Results

Long‑Text Retrieval (Pos2Distill‑R1)

Pos2Distill‑R1 consistently reduces performance variance across document positions. On the oWebQ dataset, Llama‑3‑8B achieves a 56.7% average accuracy across 20 positions, comparable to the 57.9% when the gold document is at the optimal sink position.

Figure 2: Retrieval performance of Pos2Distill‑R1

Mechanistic analysis shows that Pos2Distill‑R1 dynamically shifts attention to maintain alignment with relevant documents as the gold document moves, enhancing contextual fidelity.

Figure 3: Internal attention dynamics after applying Pos2Distill‑R1

Long‑Context Reasoning (Pos2Distill‑R2)

Pos2Distill‑R2 outperforms existing self‑training methods on both in‑domain and out‑of‑domain benchmarks. On MusiQue, it achieves an exact‑match (EM) score of 42.8, surpassing all baselines. On HotpotQA, it reaches 58.3 EM versus 50.9 for the strongest baseline, demonstrating strong cross‑domain generalization.

Figure 4: Reasoning performance of Pos2Distill‑R2

Cross‑Task Generalization

Both systems exhibit notable generalization: Pos2Distill‑R1 improves reasoning performance by 3.3% on MusiQue, while Pos2Distill‑R2 enhances retrieval ability. However, each system excels primarily in its target task, confirming that positional bias manifests differently in retrieval (token‑shifting) and reasoning (thought‑shifting) scenarios.