Why Coarse Ranking Matters: Goals, Metrics, and Model Design in Search Systems

The article explains the purpose of coarse ranking in industrial search pipelines, outlines key evaluation metrics, discusses sample construction and model architecture choices, and highlights trade‑offs between consistency with downstream ranking and overall system performance.

NewBeeNLP
NewBeeNLP
NewBeeNLP
Why Coarse Ranking Matters: Goals, Metrics, and Model Design in Search Systems

Goal of Coarse Ranking

Coarse ranking (粗排) is a crucial stage in industrial search pipelines where, under limited resources and latency constraints, a "high‑quality" subset must be selected from the full candidate pool to feed the subsequent precise ranking stage.

Evaluation Considerations

How to Evaluate Coarse Ranking

Before 2019 many practitioners measured coarse ranking with the same metrics as precise ranking (e.g., PV/Click AUC), but this caused a performance gap because the two models learn the same target with different capacities. Since 2020 the mainstream approach has been to use consistency metrics that compare coarse ranking outputs with those of the precise ranker.

Commonly used metrics include:

Precise‑ranker top‑K AUC: treat the precise ranker’s top‑K as positive and the rest as negative, then compute AUC.

Recall or Hit‑rate @ N: after truncating coarse ranking to N candidates, measure the probability that the precise ranker’s top‑K appears among the top‑N.

Consistency AUC with precise ranking: count pairwise agreements between coarse and precise rankings; however, this metric may have limited correlation with business impact.

Variants of NDCG: Discounted Cumulative Gain (DCG) divided by Ideal DCG (IDCG). DCG is calculated as the sum of each item’s value divided by log(position+1), giving higher weight to items ranked near the top.

In practice, the team also uses a custom metric based on expected DCG: define DCG@N for the coarse‑ranked list after truncation, and use the online precise‑ranker’s expected revenue (RPS or RPM) as IDCG. The ratio of simulated DCG to online DCG reflects the loss introduced by coarse truncation.

Is Higher Consistency Always Better?

Higher consistency does not guarantee better overall performance because the definition of consistency and the candidate set used for evaluation are often vague. Even if consistency metrics improve, other factors such as exposure distribution, G‑AUC, OE, and score calibration can affect the final outcome.

Alibaba’s 2022 paper separates evaluation into two parts: loss from recall to coarse ranking, and loss from coarse to precise ranking, addressing both SSB (selection‑bias) issues and alignment with the final ranking.

Potential Pitfalls of Focusing Solely on Consistency

Improving consistency usually smooths the pipeline, especially in ad‑driven scenarios where bidding efficiency matters. However, over‑emphasis on consistency can hide other optimization opportunities. In extreme cases, offline consistency may rise while online performance drops, indicating issues such as:

Precise ranker scores increase, causing the coarse ranker to select candidates that the precise model now scores lower.

Precise ranker scores stay the same or drop, revealing severe SSB problems and degraded candidate quality.

These situations require targeted adjustments rather than wholesale changes.

Sample Design

Constructing training samples must consider SSB. Samples should be drawn from exposure, ranking, and recall spaces; using only exposure samples creates a mismatch between training and inference distributions.

A common approach is to treat precise‑ranker top‑K as positive samples and perform negative sampling from recall candidates. Weighting top candidates higher often improves results, and multi‑objective modeling (e.g., bagging multiple tower scores) can further boost performance.

Model Architecture and Loss Choices

Fully‑connected single‑tower models are preferred; relying on dual‑tower architectures for coarse ranking can increase the gap with the increasingly complex precise ranker.

While computational resources are growing, the trend is toward more sophisticated precise models, which puts pressure on coarse ranking to keep up. Engineering techniques such as feature selection, quantization, distillation, and network pruning are useful for scaling single‑tower coarse models.

Loss functions have been experimented with extensively, but differences are often marginal.

References

Meituan Tech Team, “Exploration and Practice of Coarse Ranking Optimization in Meituan Search”, https://zhuanlan.zhihu.com/p/553953132

Alibaba, “Deep Unified Coarse Ranking in Taobao Main Search”, https://zhuanlan.zhihu.com/p/630985673

evaluation metricscoarse rankingsearch ranking
NewBeeNLP
Written by

NewBeeNLP

Always insightful, always fun

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.