Which Retrieval Embedding Loss Works Best? Comparing Pairwise Cosine, Triplet Margin, and InfoNCE

Even as Agentic RAG evolves, the quality of the underlying retrieval embedding model remains crucial, and this article compares three training losses—pairwise cosine embedding, triplet margin, and InfoNCE—detailing their inputs, formulas, and practical trade‑offs.

Data Party THU
Data Party THU
Data Party THU
Which Retrieval Embedding Loss Works Best? Comparing Pairwise Cosine, Triplet Margin, and InfoNCE

Despite rapid advances in Agentic Retrieval‑Augmented Generation (RAG), the retrieval component still relies on high‑quality embedding models; a more accurate model reduces the number of iterative calls, saving time and cost. This article focuses on the learning methods for retrieval embedding models.

Pairwise Cosine Embedding Loss

The input consists of a text pair and a label indicating whether the pair is a positive or negative match, similar to entailment vs. contradiction in the MNLI dataset. The loss function uses cosine similarity between the two embedding vectors x and y.

图片
图片

Positive example pairs and negative example pairs are illustrated in the accompanying figures.

Triplet Margin Loss

The input expands to three texts: an anchor, a positive match, and a negative match. The loss is the standard Triplet Margin Loss, where a is the anchor embedding, p the positive embedding, and n the negative embedding.

图片
图片

InfoNCE Loss

The input includes a query, one positive match, and a list of negative samples. The loss follows the InfoNCE formulation, inspired by the M3‑Embedding paper (arXiv:2402.03216). In the formula, p* is the positive embedding, P' the set of negative embeddings, q the query embedding, and s(.) a similarity function such as cosine similarity.

图片
图片

Comparison

Which method performs best depends on the specific scenario, data volume, and compute resources. In the author’s experiments, InfoNCE offered the broadest coverage, but with sufficient tuning, pairwise cosine loss can achieve comparable results. Triplet margin loss appears to be a middle ground between the two.

AIEmbeddingLoss FunctionsInfoNCEpairwise cosinetriplet margin
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.