Artificial Intelligence 14 min read

Cross-Domain Multi-Objective Estimation and Fusion in Baidu Video Recommendation

This article presents Baidu's technical experience on designing, estimating, and fusing cross-domain multi-objective models for its immersive video recommendation system, covering business background, system architecture, target design, long‑term value modeling, and evolution strategies.

DataFunTalk
DataFunTalk
DataFunTalk
Cross-Domain Multi-Objective Estimation and Fusion in Baidu Video Recommendation

This article shares Baidu's experience with cross‑domain multi‑objective estimation and fusion for its video recommendation platform, discussing business motivations, system architecture, target design, modeling, and practical deployment.

1. Baidu Video Background

Baidu App has unified all video scenes into an immersive up‑and‑down swipe interaction, leveraging a unified large model to connect data and recommendation experiences across scenarios, aiming for ecosystem win‑win and long‑term growth.

2. Recommendation System Overview

The recommendation platform serves three players: users (content consumers), creators (content providers), and advertisers (revenue source). It solves two core problems: selecting high‑quality content for distribution (B‑side) and delivering an excellent consumption experience to users (C‑side).

The pipeline includes meta storage, multi‑goal recall, coarse and fine ranking, multi‑goal fusion, diversity awareness, sequence modeling, and traffic allocation to generate the final video list.

3. Multi‑Objective Design and Modeling

In immersive video scenarios, explicit click feedback is scarce; instead, watch time, hide actions, and other implicit signals become crucial. Additional signals such as follows, comments, likes, and search behavior are incorporated, though they are sparse.

The target design balances explicit positive/negative signals with implicit ones to avoid a biased system.

Higher‑order goals include a satisfaction model built from dense signals (e.g., full‑play, duration) and a long‑term value (LTV) model that attributes future consumption to the current video based on temporal proximity and relevance.

4. Cross‑Domain Multi‑Objective Modeling

To address domain shift and negative transfer among many heterogeneous goals, Baidu adopts a gated hierarchical architecture: a common personalized backbone, a Cross‑Domain Gating (CGC) network for inter‑domain information extraction, and domain‑specific multi‑objective heads.

This design yields a 3‑9‰ AUC improvement over single‑domain models and clearly separates embeddings of different domains (e.g., search C vs. two‑hop video).

5. Multi‑Objective Fusion

Fusion evolved from manual prior knowledge to Learning‑to‑Rank (LTR) and finally to Evolution Strategy (ES) based models. The reward (North Star metric) combines session depth (duration + steps) and interaction signals, reflecting long‑term user retention and engagement.

The current ES model incorporates scene‑ and user‑level information for higher‑order optimization.

6. Future Directions and Recruitment

The talk concludes with an invitation to join the team (HR email: [email protected]) and thanks the audience.

AIRecommendation systemsvideo recommendationCross-Domainmulti-objective modeling
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.