Which Retrieval Embedding Loss Works Best? Comparing Pairwise Cosine, Triplet Margin, and InfoNCE
Even as Agentic RAG evolves, the quality of the underlying retrieval embedding model remains crucial, and this article compares three training losses—pairwise cosine embedding, triplet margin, and InfoNCE—detailing their inputs, formulas, and practical trade‑offs.
Despite rapid advances in Agentic Retrieval‑Augmented Generation (RAG), the retrieval component still relies on high‑quality embedding models; a more accurate model reduces the number of iterative calls, saving time and cost. This article focuses on the learning methods for retrieval embedding models.
Pairwise Cosine Embedding Loss
The input consists of a text pair and a label indicating whether the pair is a positive or negative match, similar to entailment vs. contradiction in the MNLI dataset. The loss function uses cosine similarity between the two embedding vectors x and y.
Positive example pairs and negative example pairs are illustrated in the accompanying figures.
Triplet Margin Loss
The input expands to three texts: an anchor, a positive match, and a negative match. The loss is the standard Triplet Margin Loss, where a is the anchor embedding, p the positive embedding, and n the negative embedding.
InfoNCE Loss
The input includes a query, one positive match, and a list of negative samples. The loss follows the InfoNCE formulation, inspired by the M3‑Embedding paper (arXiv:2402.03216). In the formula, p* is the positive embedding, P' the set of negative embeddings, q the query embedding, and s(.) a similarity function such as cosine similarity.
Comparison
Which method performs best depends on the specific scenario, data volume, and compute resources. In the author’s experiments, InfoNCE offered the broadest coverage, but with sufficient tuning, pairwise cosine loss can achieve comparable results. Triplet margin loss appears to be a middle ground between the two.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
