Artificial Intelligence 22 min read

Coarse Ranking in Recommenders: Key Strategies, Metrics & Optimizations

This article systematically reviews the coarse‑ranking stage of recommendation systems, comparing it with recall and fine‑ranking, defining evaluation metrics, detailing sample design, presenting two technical routes, and exploring optimization directions such as dual‑tower models, knowledge distillation, lightweight fully‑connected layers, multi‑objective and multi‑scenario modeling, followed by practical case studies and results.

DeWu Technology

Dec 20, 2023

Coarse Ranking in Recommenders: Key Strategies, Metrics & Optimizations

Background

Coarse‑ranking sits between recall and fine‑ranking, improving recall accuracy and setting an upper bound for fine‑ranking performance.

Positioning

Differences with fine‑ranking : score volume (thousands‑10k vs. hundreds), stricter latency, need to separate liked from disliked items. Differences with recall : candidate set comes from fused recall results, coarse‑ranking must also order items, and both suffer sample‑selection bias (recall bias larger).

Evaluation Metrics

A global Hitrate framework defines two groups of metrics.

Coarse→Fine loss : scene‑internal Hitrate@TopK, NDCG between coarse scores and fine‑ranking efficiency scores, AUC.

Recall→Coarse loss : scene‑external Hitrate@TopK.

Dislike discrimination : scene‑internal Hitrate@TopK on exposure‑without‑click (lower is better).

Samples are collected per request: fused layer outputs, exposed samples, click samples, non‑click samples, global exposure/click samples, and globally corrected exposure‑click samples.

Key offline/online metrics: scene‑click Hitrate@TopK, scene‑non‑click Hitrate@TopK, global click Hitrate@TopK, adjusted global click Hitrate@TopK, NDCG, AUC.

Sample Design

To mitigate stronger sample‑selection bias in coarse‑ranking, the following pools and methods are used.

Negative pool : non‑clicked exposures, all non‑conversion samples, low‑rank fine‑ranking items, recall samples excluding exposure.

Positive pool : clicked exposures, global click samples, delayed click samples (e.g., next‑day clicks).

Sampling methods : random sampling and hot‑item down‑sampling.

Three concrete composition schemes:

Positive: exposure‑click; Negative: exposure‑non‑click.

Positive: exposure‑click + global‑click‑corrected; Negative: exposure‑non‑click + randomly sampled non‑exposure recall samples.

Positive: exposure‑click + high‑rank fine‑ranking items; Negative: exposure‑non‑click + low‑rank fine‑ranking items.

Technical Routes

Two modeling paradigms:

Listwise (set‑based) : models the target set directly, interacts with fine‑ranking, lower stability.

Pointwise (value‑based) : predicts conversion probability per item, higher controllability and independent iteration.

Pointwise is preferred for direct alignment with the final objective.

Development roadmap:

Quality‑score models (e.g., LR, XGBoost).

Deep vector‑inner‑product models such as dual‑tower or triple‑tower structures – fast online, low engineering overhead, limited cross‑feature handling.

Deep cross‑layer models (e.g., COLD framework) – richer cross features at the cost of latency and complexity.

Optimization Directions

Dual‑Tower Enhancements

Insert SENet modules into both user and item towers to dynamically re‑weight important features, improving robustness to noise.

Sequence Feature Learning

Upgrade the user tower with multi‑granularity behavior sequences and query semantics; use LSTM + multi‑head attention for real‑time sequences and pooling for long‑term sequences.

Parallel‑Tower (Inner‑Product Expressiveness)

Parallelize multiple sub‑models (MLP, DCN, FM, CIN) and concatenate their outputs before a final LR layer, enriching representation capacity.

Dual‑Tower Cross‑Enhancement

Add side‑tower information vectors (a_u) to each tower and train with a mimic loss that updates these vectors only for positive labels, injecting cross‑tower signals.

Knowledge Distillation

Use teacher‑student training where the teacher (fine‑ranking) is a more powerful model with privileged features, and the student (coarse‑ranking) is a dual‑tower model. Strategies include privileged‑feature distillation and model distillation.

Auto Feature & Structure Selection (AutoFAS)

Jointly select optimal coarse‑ranking features and architecture under latency constraints using feature masks and MixOp modules, guided by a combined loss of distillation, latency, and coarse‑ranking objectives.

Lightweight Fully‑Connected Layers

Adopt the COLD framework with SEBlock modules to compute feature importance, retain critical features, and accelerate inference via parallelism, quantization, and column‑wise computation.

Multi‑Objective Modeling

Three architectures are explored:

Shared‑parameter multi‑tower (separate user/item towers per objective with shared lower layers).

MMoE‑based dual‑tower where each tower uses a Multi‑Gate Mixture‑of‑Experts to differentiate objectives.

Unified user embedding shared across objectives, enabling joint estimation of conversion‑related goals.

Online fusion formulas (linear addition, exponential multiplication, weighted variants) yield significant offline AUC gains and modest online DPV/UV improvements.

Multi‑Scenario Modeling

Challenges include scenario bias in user/item distributions. Solutions involve:

Scenario statistical features (CTR, CVR per scenario).

Cross‑features between users/items and scenarios.

Embedding scenario features as bias inputs or via dedicated sub‑networks.

Dynamic weighting: reshape scenario features to match each hidden layer’s dimension and multiply with intermediate activations.

Meta‑learning approaches (M2M) use a Meta Unit to capture inter‑scenario relationships and a meta‑attention module for task correlation, enabling rapid adaptation to new scenarios. Two‑stage training (scenario‑supervised contrastive pre‑training followed by fine‑tuning) further refines scenario‑aware representations.

Practical Cases

Single‑objective dual‑tower coarse‑ranking (CTR) using exposure‑click vs. exposure‑non‑click samples: +6 % DPV, +3 % UV.

Triple‑tower multi‑objective coarse‑ranking (CTR + CVR) with weighted exponential multiplication: +5.6 % CTR AUC, +45 % CVR AUC offline, modest online gains.

Multi‑objective + scenario‑feature enhancements: +2 % AUC offline, +2.6 % DPV, +4.9 % UV online.

Conclusion

Coarse‑ranking is a critical lever for recommendation efficiency, offering numerous optimization avenues—from model architecture and knowledge distillation to multi‑objective and multi‑scenario strategies. Ongoing work will continue to refine these directions to further boost system performance.

References

https://zhuanlan.zhihu.com/p/630985673

https://arxiv.org/abs/2005.09683

https://zhuanlan.zhihu.com/p/358779957

https://zhuanlan.zhihu.com/p/409390150

https://mp.weixin.qq.com/s/karPWLyHITu-qZceEhpn-w

https://zhuanlan.zhihu.com/p/608636233

https://zhuanlan.zhihu.com/p/581286422

https://arxiv.org/pdf/1907.05171.pdf

https://arxiv.org/pdf/2205.09394.pdf

https://zhuanlan.zhihu.com/p/186320100

https://arxiv.org/pdf/2102.07142.pdf

https://zhuanlan.zhihu.com/p/500237779

https://zhuanlan.zhihu.com/p/524201399

https://mp.weixin.qq.com/s/gphLbCsimD3w-IoWtdz-pg

https://zhuanlan.zhihu.com/p/496820123

https://blog.csdn.net/abcdefg90876/article/details/128246212

Code example

[1] https://zhuanlan.zhihu.com/p/630985673
[2] https://arxiv.org/abs/2005.09683
[3] https://zhuanlan.zhihu.com/p/358779957
[4] https://zhuanlan.zhihu.com/p/409390150
[5] https://mp.weixin.qq.com/s/karPWLyHITu-qZceEhpn-w
[6] https://zhuanlan.zhihu.com/p/608636233?utm_id=0
[7] https://zhuanlan.zhihu.com/p/581286422
[8] https://arxiv.org/pdf/1907.05171.pdf
[9] https://arxiv.org/pdf/2205.09394.pdf
[10] https://zhuanlan.zhihu.com/p/186320100
[11] https://arxiv.org/pdf/2102.07142.pdf
[12] https://zhuanlan.zhihu.com/p/500237779
[13] https://zhuanlan.zhihu.com/p/524201399
[14] https://mp.weixin.qq.com/s/gphLbCsimD3w-IoWtdz-pg
[15] https://zhuanlan.zhihu.com/p/496820123
[16] https://blog.csdn.net/abcdefg90876/article/details/128246212

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

dual-tower Evaluation Metrics recommender systems multi-scenario coarse ranking knowledge distillation multi-objective

Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.