Artificial Intelligence 17 min read

EMER: End-to-End Multi-Objective Ranking That Transforms Short-Video Recommendations

EMER, Kuaishou’s end‑to‑end multi‑objective ensemble ranking framework, replaces handcrafted scoring formulas with a transformer‑based model that learns comparative preferences, integrates normalized rank features, optimizes relative satisfaction and multi‑dimensional proxy metrics, and dynamically balances objectives via a self‑evolving advantage evaluator, delivering significant online gains.

Kuaishou Large Model

Oct 31, 2025

EMER: End-to-End Multi-Objective Ranking That Transforms Short-Video Recommendations

Background and Challenges

Short‑video apps decide what users see next through ranking logic. Historically, recommendation ranking relied on manually designed formulas that weighted signals such as likes and watch time, but these formulas struggle with personalization, non‑linear relationships, and multi‑objective trade‑offs (e.g., retention vs. play count).

EMER Framework Overview

The Kuaishou Strategy Algorithm team introduced EMER (End‑to‑End Multi‑objective Ensemble Ranking) to replace traditional formula‑based ranking with a model‑driven approach. EMER treats ranking as a comparison problem and uses a Transformer‑based architecture to jointly optimize multiple objectives.

Key Contributions

Redefine ranking from independent scoring to pairwise comparison.

Introduce Relative Advantage Satisfaction and multi‑dimensional proxy metrics to better capture user satisfaction.

Design a self‑evolving optimization scheme that dynamically balances loss weights across objectives.

Model Design

Sample Organization

All candidate items in a request are packed into a single training sample, providing rich comparison pairs and mitigating exposure bias.

Feature Engineering

In addition to item‑level features, EMER adds Normalized Ranks (original rank / total candidates) to convey each item's relative position within the candidate set.

Architecture

A Transformer network processes the sequence of candidate items, capturing complex inter‑item relationships and producing scores that reflect both intrinsic quality and contextual value.

Loss Functions

EMER employs a pairwise logistic loss to learn hierarchical satisfaction relations (multiple positive feedback > single positive feedback > no feedback). Multi‑dimensional proxy metrics are treated as separate supervised targets; each is optimized via a differentiable surrogate of AUC.

The overall loss is the sum of the relative‑satisfaction loss and the multi‑dimensional proxy loss, weighted by a dynamic Advantage Evaluator (AE) that adapts based on online performance.

Self‑Evolving Optimization

The AE computes adaptive weights for each objective by comparing the current model’s performance to the previous version. When an objective degrades, its weight increases, prompting the model to focus on it; when it improves, the weight decreases.

Experimental Evaluation

Offline and Online Results

EMER was deployed in Kuaishou’s main app and fast‑version app. A/B tests showed significant lifts in 7‑day retention (+0.30% for fast version, +0.23% for main app), app stay time (+1.39% and +1.20% respectively), and short‑video view counts (+1.04% and +3.00%).

Ablation Studies

Removing comparison modeling (EMER‑NoComp) degrades both offline GAUC and online metrics.

Omitting posterior satisfaction signals (EMER‑NoPost) or prior proxy signals (EMER‑NoPrior) reduces performance, with prior signals having a larger impact.

Disabling the self‑evolution mechanism (EMER‑NoEvolve) leads to imbalanced improvements (e.g., higher watch time but lower play count and interactions).

Excluding the IPUT metric (EMER‑NoIPUT) breaks alignment between offline training and online user satisfaction.

Metric Analysis

DCG@K was found to correlate best with overall GAUC across objectives and was adopted as the default evaluation metric for the Advantage Evaluator.

Conclusion

EMER demonstrates a practical, end‑to‑end solution for personalized multi‑objective ranking in large‑scale recommendation systems, addressing the three core challenges of undefined satisfaction, comparison modeling, and objective conflict. Its deployment yields measurable improvements in user retention, engagement, and overall business metrics.

machine learning Transformer Recommendation Systems online experiments multi-objective ranking

Written by

Kuaishou Large Model

Official Kuaishou Account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.