EMER: End-to-End Multi-Objective Ranking That Transforms Short-Video Recommendations
EMER, Kuaishou’s end‑to‑end multi‑objective ensemble ranking framework, replaces handcrafted scoring formulas with a transformer‑based model that learns comparative preferences, integrates normalized rank features, optimizes relative satisfaction and multi‑dimensional proxy metrics, and dynamically balances objectives via a self‑evolving advantage evaluator, delivering significant online gains.
Background and Challenges
Short‑video apps decide what users see next through ranking logic. Historically, recommendation ranking relied on manually designed formulas that weighted signals such as likes and watch time, but these formulas struggle with personalization, non‑linear relationships, and multi‑objective trade‑offs (e.g., retention vs. play count).
EMER Framework Overview
The Kuaishou Strategy Algorithm team introduced EMER (End‑to‑End Multi‑objective Ensemble Ranking) to replace traditional formula‑based ranking with a model‑driven approach. EMER treats ranking as a comparison problem and uses a Transformer‑based architecture to jointly optimize multiple objectives.
Key Contributions
Redefine ranking from independent scoring to pairwise comparison.
Introduce Relative Advantage Satisfaction and multi‑dimensional proxy metrics to better capture user satisfaction.
Design a self‑evolving optimization scheme that dynamically balances loss weights across objectives.
Model Design
Sample Organization
All candidate items in a request are packed into a single training sample, providing rich comparison pairs and mitigating exposure bias.
Feature Engineering
In addition to item‑level features, EMER adds Normalized Ranks (original rank / total candidates) to convey each item's relative position within the candidate set.
Architecture
A Transformer network processes the sequence of candidate items, capturing complex inter‑item relationships and producing scores that reflect both intrinsic quality and contextual value.
Loss Functions
EMER employs a pairwise logistic loss to learn hierarchical satisfaction relations (multiple positive feedback > single positive feedback > no feedback). Multi‑dimensional proxy metrics are treated as separate supervised targets; each is optimized via a differentiable surrogate of AUC.
The overall loss is the sum of the relative‑satisfaction loss and the multi‑dimensional proxy loss, weighted by a dynamic Advantage Evaluator (AE) that adapts based on online performance.
Self‑Evolving Optimization
The AE computes adaptive weights for each objective by comparing the current model’s performance to the previous version. When an objective degrades, its weight increases, prompting the model to focus on it; when it improves, the weight decreases.
Experimental Evaluation
Offline and Online Results
EMER was deployed in Kuaishou’s main app and fast‑version app. A/B tests showed significant lifts in 7‑day retention (+0.30% for fast version, +0.23% for main app), app stay time (+1.39% and +1.20% respectively), and short‑video view counts (+1.04% and +3.00%).
Ablation Studies
Removing comparison modeling (EMER‑NoComp) degrades both offline GAUC and online metrics.
Omitting posterior satisfaction signals (EMER‑NoPost) or prior proxy signals (EMER‑NoPrior) reduces performance, with prior signals having a larger impact.
Disabling the self‑evolution mechanism (EMER‑NoEvolve) leads to imbalanced improvements (e.g., higher watch time but lower play count and interactions).
Excluding the IPUT metric (EMER‑NoIPUT) breaks alignment between offline training and online user satisfaction.
Metric Analysis
DCG@K was found to correlate best with overall GAUC across objectives and was adopted as the default evaluation metric for the Advantage Evaluator.
Conclusion
EMER demonstrates a practical, end‑to‑end solution for personalized multi‑objective ranking in large‑scale recommendation systems, addressing the three core challenges of undefined satisfaction, comparison modeling, and objective conflict. Its deployment yields measurable improvements in user retention, engagement, and overall business metrics.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
