How Ranking Improves In-Context Example Retrieval: Insights from NeurIPS ’25
This article explains the limitations of current pointwise in‑context learning methods, introduces a novel ranking‑based approach called SeDPO that learns preference orders among examples, and demonstrates its superior performance across multiple NLP tasks through extensive experiments and ablation studies.
1. What is ICL and Why It Matters
Large Language Models (LLMs) can solve many tasks via In‑Context Learning (ICL), which adapts to new tasks by providing a few demonstration examples in the prompt without updating model parameters. ICL enables fast few‑shot adaptation, reducing the need for large labeled datasets and lowering deployment barriers.
2. Current ICL Practices and Their Issues
Most retrieval methods train in a pointwise fashion, separating In‑Context Examples (ICEs) into “top‑1” and “others” based on LLM scores. At inference time, examples are ranked by score, creating a mismatch between training objectives and inference behavior. For instance, a math problem may have a “top‑1” example, but if no similar ICE exists the model cannot produce a reasonable demonstration.
3. Solving the Problem by Learning a Ranking
3.1 Why Use Ranking
Instead of classifying examples, learning a preference order among ICEs better matches the retrieval goal. By evaluating the probability that the LLM generates the correct answer conditioned on each ICE, the algorithm constructs a ranking loss (SeDPO) to train the retriever.
3.2 How to Learn the Ranking
Given a test input (x, y) and two retrieved examples e_w[1:k] and e_l[1:k], the model scores each example by the likelihood of producing the correct answer. The scores are normalized to obtain a ranking for the ICL examples. The DPO framework aligns the retriever (treated as a policy model) with these pairwise preferences, yielding a loss that directly reflects the ordering of examples.
3.3 Improving Ranking Learning
The paper introduces a sequential relaxation that selects examples iteratively, reducing the computational burden of encoding the entire corpus at each step. This sequential retrieval (shown in Figure 2) first retrieves the most relevant example, then concatenates it with the query to retrieve the next one, and so on, producing an ordered set e[1:K]. The resulting loss eliminates the expensive term γ_j and enables efficient training.
4. Experiments
4.1 Main Result
The proposed SeDPO method achieves the best average performance across nine NLP benchmarks, outperforming strong baselines such as BERT‑base and RoBERTa.
4.2 Ablation and Supplementary Studies
Ablation Study
Removing the diverse partial order data (using only “top‑1 chosen”) drops accuracy from 77.9 % to 70.8 %, highlighting the importance of diverse preference data. Switching the backbone from BERT‑base to RoBERTa improves results to 85.7 %.
Diversity of Retrieved Examples
Random retrieval yields higher diversity but unstable performance, whereas SeDPO maintains both diversity and effectiveness, confirming that good ICEs and their ordering are both crucial.
Transferability
SeDPO consistently attains the highest average scores across different shot settings and various LLM sizes, demonstrating strong generalization.
Conclusion
The paper proposes an algorithm that trains a retriever by learning preference rankings among in‑context examples, using a SeDPO loss that combines Direct Preference Optimization with sequential example relaxation. SeDPO achieves state‑of‑the‑art results on multiple tasks and remains robust under various experimental conditions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Amap Tech
Official Amap technology account showcasing all of Amap's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
