How Disney+ Designs a Multi‑Task Video Search Ranking Model

This article explains the architecture of a video search ranking system that combines a deep encoding network, multi‑task expert networks, and a bias‑correction module to jointly optimize relevance, click‑through rate, and watch time for streaming platforms.

Hulu Beijing
Hulu Beijing
Hulu Beijing
How Disney+ Designs a Multi‑Task Video Search Ranking Model

Model structure determines information extraction efficiency, and extensive research exists in the video search domain. Inspired by industry ranking models and tailored to video search characteristics, we designed a video search ranking model composed of three parts: a deep encoding network for information extraction, a multi‑task expert network suitable for multi‑objective optimization, and a position‑bias mitigation network.

Query‑side information expression includes the original query term generated by a subterm bag‑of‑words model, expanded queries derived from co‑clicked queries, and head‑clicked videos. Expanded queries are selected by co‑click frequency, and an attention mechanism differentiates their relevance to the original query.

Candidate video information expression incorporates meta data (title, description, category) and statistical data (clicks, views, popularity). Pre‑trained model outputs and recommendation system vectors are added to enrich video representations, followed by an attention layer to capture interactions.

User information expression leverages playback and search histories. Each historical video is encoded via the deep encoding network, weighted by attention scores computed from the current query, video vector, and positional information. The final user vector concatenates weighted playback history, weighted search history, and a user ID embedding.

Multi‑task expert network (Customized Gate Control) jointly optimizes relevance, click‑through rate, and play duration. It consists of shared experts, task‑specific experts, a gate network that weights expert outputs for each task, and a tower network that produces task‑specific predictions. The overall loss combines task losses with predefined weights.

Bias network models observation probability to correct position bias. Inputs include device type, product type, query length, and search page. During training, dropout is applied to bias features; during online inference, the bias network is removed and only the unbiased click prediction is used.

Feature engineering retains many handcrafted features such as hit information, text similarity (TF‑IDF, BM25, EditDistance, Jaccard), semantic similarity from pre‑trained models, click similarity via GNN, video quality attributes, and statistical metrics (query count, click count, CTR). Features are discretized using equal‑frequency binning.

Feature importance analysis is performed offline by masking feature groups and measuring AUC change, and online by swapping feature groups between videos to identify causes of ranking anomalies.

Overall, the system integrates classic LTR algorithms, modern DNN structures, multi‑objective optimization, and bias mitigation to deliver personalized, relevant video search results for Hulu, Disney+, and Star+.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

feature engineeringdeep learningMulti-Task LearningBias Correctionvideo searchranking model
Hulu Beijing
Written by

Hulu Beijing

Follow Hulu's official WeChat account for the latest company updates and recruitment information.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.