How Ranking Improves In-Context Example Retrieval: Insights from NeurIPS ’25

This article explains the limitations of current pointwise in‑context learning methods, introduces a novel ranking‑based approach called SeDPO that learns preference orders among examples, and demonstrates its superior performance across multiple NLP tasks through extensive experiments and ablation studies.

Amap Tech
Amap Tech
Amap Tech
How Ranking Improves In-Context Example Retrieval: Insights from NeurIPS ’25

1. What is ICL and Why It Matters

Large Language Models (LLMs) can solve many tasks via In‑Context Learning (ICL), which adapts to new tasks by providing a few demonstration examples in the prompt without updating model parameters. ICL enables fast few‑shot adaptation, reducing the need for large labeled datasets and lowering deployment barriers.

2. Current ICL Practices and Their Issues

Most retrieval methods train in a pointwise fashion, separating In‑Context Examples (ICEs) into “top‑1” and “others” based on LLM scores. At inference time, examples are ranked by score, creating a mismatch between training objectives and inference behavior. For instance, a math problem may have a “top‑1” example, but if no similar ICE exists the model cannot produce a reasonable demonstration.

Figure 1
Figure 1

3. Solving the Problem by Learning a Ranking

3.1 Why Use Ranking

Instead of classifying examples, learning a preference order among ICEs better matches the retrieval goal. By evaluating the probability that the LLM generates the correct answer conditioned on each ICE, the algorithm constructs a ranking loss (SeDPO) to train the retriever.

Figure 2
Figure 2

3.2 How to Learn the Ranking

Given a test input (x, y) and two retrieved examples e_w[1:k] and e_l[1:k], the model scores each example by the likelihood of producing the correct answer. The scores are normalized to obtain a ranking for the ICL examples. The DPO framework aligns the retriever (treated as a policy model) with these pairwise preferences, yielding a loss that directly reflects the ordering of examples.

Figure 3
Figure 3

3.3 Improving Ranking Learning

The paper introduces a sequential relaxation that selects examples iteratively, reducing the computational burden of encoding the entire corpus at each step. This sequential retrieval (shown in Figure 2) first retrieves the most relevant example, then concatenates it with the query to retrieve the next one, and so on, producing an ordered set e[1:K]. The resulting loss eliminates the expensive term γ_j and enables efficient training.

Figure 4
Figure 4

4. Experiments

4.1 Main Result

The proposed SeDPO method achieves the best average performance across nine NLP benchmarks, outperforming strong baselines such as BERT‑base and RoBERTa.

4.2 Ablation and Supplementary Studies

Ablation Study

Removing the diverse partial order data (using only “top‑1 chosen”) drops accuracy from 77.9 % to 70.8 %, highlighting the importance of diverse preference data. Switching the backbone from BERT‑base to RoBERTa improves results to 85.7 %.

Diversity of Retrieved Examples

Random retrieval yields higher diversity but unstable performance, whereas SeDPO maintains both diversity and effectiveness, confirming that good ICEs and their ordering are both crucial.

Transferability

SeDPO consistently attains the highest average scores across different shot settings and various LLM sizes, demonstrating strong generalization.

Conclusion

The paper proposes an algorithm that trains a retriever by learning preference rankings among in‑context examples, using a SeDPO loss that combines Direct Preference Optimization with sequential example relaxation. SeDPO achieves state‑of‑the‑art results on multiple tasks and remains robust under various experimental conditions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large language modelsrankingNeurIPSretrievalIn‑context LearningSeDPO
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.