Tagged articles
6 articles
Page 1 of 1
James' Growth Diary
James' Growth Diary
May 11, 2026 · Artificial Intelligence

Mastering RAG Evaluation: Recall@K, MRR, NDCG, and RAGAS Explained

This article breaks down RAG evaluation into a two‑layer framework, explains the four core metrics—Recall@K, MRR, NDCG, and the four RAGAS scores—shows how to implement them with LangChain.js, highlights common pitfalls, and offers scenario‑specific metric combinations for reliable performance monitoring.

LangChainMRRNDCG
0 likes · 20 min read
Mastering RAG Evaluation: Recall@K, MRR, NDCG, and RAGAS Explained
AI Engineer Programming
AI Engineer Programming
Apr 20, 2026 · Artificial Intelligence

Evaluating Retriever Quality in RAG: Essential Metrics for Production Reliability

The article explains why retrieval quality dominates RAG performance and outlines a rigorous evaluation framework—including prompt, ranked results, and ground‑truth annotations—and detailed metrics such as Precision, Recall, MAP@K, NDCG@K, MRR, and F‑scores, while discussing chunking strategies, embedding choices, hybrid retrieval, and CI/CD‑driven monitoring to ensure production reliability.

LLMMAPNDCG
0 likes · 12 min read
Evaluating Retriever Quality in RAG: Essential Metrics for Production Reliability
Meituan Technology Team
Meituan Technology Team
Sep 24, 2020 · Artificial Intelligence

Multimodal Recall Solution for KDD Cup 2020: ImageBERT and LXMERT Based Approach

The second‑place team tackled KDD Cup 2020’s Multimodal Recall challenge by fine‑tuning ImageBERT and LXMERT on query‑image pairs, generating negatives, applying AMSoftmax and multi‑similarity losses, ensembling weighted predictions, and using score‑based post‑processing, boosting NDCG@5 to 0.8352 and powering Meituan’s multimodal search pipeline.

ImageBERTKDD Cup 2020LXMERT
0 likes · 23 min read
Multimodal Recall Solution for KDD Cup 2020: ImageBERT and LXMERT Based Approach
DataFunTalk
DataFunTalk
Jun 21, 2019 · Artificial Intelligence

Applying Deep Learning to Airbnb Search: Model Evolution, Feature Engineering, and System Insights

This article reviews the Airbnb search ranking paper, detailing offline and online performance gains, the progression from SimpleNN to LambdaRankNN, GBDT/FM NN, and Deep NN models, failed embedding attempts, extensive feature engineering practices, and the production system architecture that enabled large‑scale deep learning deployment.

AirbnbNDCGmodel evolution
0 likes · 10 min read
Applying Deep Learning to Airbnb Search: Model Evolution, Feature Engineering, and System Insights
vivo Internet Technology
vivo Internet Technology
Jan 22, 2018 · Artificial Intelligence

Learning to Rank: From Regression to Search Ranking and Evaluation Methods

Learning to rank reframes search as a machine‑learning problem that optimizes document ordering rather than numeric prediction, using relevance metrics such as NDCG and feature‑based scoring functions, and comparing point‑wise, pair‑wise (RankSVM) and list‑wise (ListNet) approaches while stressing that proper error definition and feature selection matter more than the specific algorithm.

Learning-to-RankNDCGPairwise
0 likes · 16 min read
Learning to Rank: From Regression to Search Ranking and Evaluation Methods