How Alibaba Boosted Short‑Video Engagement with Advanced Recommendation Algorithms

This article explains the rapid growth of short‑video on Taobao, describes the video feature framework, details the RankI2V and RankV2V recall methods, outlines coarse and fine ranking models, and presents real‑time interest and business strategies that significantly improved click‑through rates and viewing time.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba Boosted Short‑Video Engagement with Advanced Recommendation Algorithms

1. Introduction

Short videos (typically under 5 minutes) have exploded in popularity due to fragmented user time, widespread mobile internet, low production barriers, and rapid smartphone adoption. From the early stage in 2011 to the rise of platforms such as Kuaishou, Meipai, Douyin, and Kuaishou in 2015‑2016, short videos now capture a large share of user attention, reshaping information consumption habits.

Currently, short‑video monthly active users in China reach about 400 million, with average daily watch time exceeding 60 minutes, demonstrating strong stickiness and significant competition for user time across social, audio‑visual, gaming, and news apps.

1.2 Taobao Short‑Video Landscape

Taobao now hosts over 260 million videos. Product thumbnail images have largely been replaced by short videos that showcase usage details; buyer‑generated videos provide more reliable references; livestream hosts drive massive traffic; and creator‑produced videos offer diverse, enjoyable browsing experiences. The surge in video volume and user demand raises new challenges for video recommendation algorithms.

This article summarizes recent practical work on video recommendation in the "Wow Video" and "Guess You Like" scenarios.

2. Video Feature System

The video feature system consists of ID features, generalized product features, video‑statistical features, video‑content features, and tag features.

ID features include video ID, author ID, etc.

Generalized product features derive from the product attributes attached to a video, such as product ID, category ID, virtual category ID, shop ID, brand ID, gender, purchasing power, and product tags.

Statistical features capture play‑rate, average watch time, effective play‑rate, etc., across different scenes, categories, and authors.

Content features include key‑frame image embeddings and audio embeddings for fine‑grained video description.

Tag features are generated by a multi‑class classification model forming a graph‑like tag hierarchy, currently covering the fashion domain.

Video feature diagram
Video feature diagram

3. Video Recall

3.1 RankI2V Recall

Because most Taobao videos are linked to products, the initial recall method extended the item‑to‑item (i2i) product recall to i2i2v, using the user's recent click/collect/purchase/add‑to‑cart items as trigger items, finding similar candidate items, then mapping to candidate videos and prioritizing hot videos.

While this leverages existing product‑recall infrastructure, it suffers from two major drawbacks: a single product may be linked to many videos, making it hard to assess each video's value for a specific user, and product interest does not always translate to video interest. Moreover, user video‑behavior signals (watch time, likes, follows) are ignored.

RankI2V addresses these issues by constructing samples from full‑screen page playback logs, using watch duration as the label, and training a GBDT model with pairwise loss on trigger item, video‑related product, and video features to directly score item‑to‑video relevance.

Sample construction steps:

Extract trigger item, watched video, and watch duration from logs.

After cleaning, treat very short watches as negative samples and longer watches as positive samples, weighting by actual duration.

Resample to ensure both positive and negative samples exist for the same user‑trigger item pair.

Feature groups:

Trigger item features: category, price, popularity, dynamic score, exposure/click metrics across time slices, seller and category dimensions.

Video features: exposure, click, effective play‑rate, completion rate, per‑view and per‑user watch time across site and Wow Video scenes, segmented by category and author.

Item‑video similarity: same category, same seller, similarity scores.

The GBDT model with pairwise loss yields significant improvements over the original i2i2v recall in both Wow Video waterfall and full‑screen pages.

3.2 RankV2V Recall

Although product‑based recall benefits from rich user behavior, a more natural approach is video‑to‑video recall. RankV2V builds on a collaborative‑filtering (CF) V2V model trained on full‑network playback logs with effective watch time as the target.

Samples are constructed similarly to RankI2V, using trigger video, recalled video, and watch duration as label, followed by cleaning and resampling.

Feature groups mirror those of RankI2V but focus on video‑specific attributes for both trigger and recalled videos, plus their similarity.

GBDT with pairwise loss is used for training. Online experiments show RankV2V, despite fewer candidates, outperforms RankI2V in CTR and watch‑time metrics, especially in the "Guess You Like" scenario where CTR increased by ~15% and click‑through rate for RankV2V alone rose by ~50%.

3.3 Real‑time Interest

Real‑time signals are crucial for video recommendation. Four real‑time triggers are employed on the full‑screen page:

Video clicked from the waterfall page (strong intent).

Product attached to the clicked video (likely of interest).

User actions (play, like, comment, follow) on the previous full‑screen video.

Real‑time logs of user video plays across the site (delay of seconds to minutes).

All four triggers are given equal priority and outrank other recall methods, leading to over 10% lift in per‑view watch time and up to 15% increase in per‑user watch time on the full‑screen page.

4. Video Ranking

4.1 Coarse Ranking

After multi‑recall, a coarse ranking model (GBDT) reduces the candidate set. Two objectives are used:

CTR‑oriented pointwise GBDT, with features such as context, user, video attributes, and feedback statistics.

Watch‑time‑oriented pairwise GBDT, adding more watch‑time related statistical features.

Experiments show CTR improvements >10% for "Guess You Like" and up to 30% increase in playback duration for immersive pages.

4.2 Fine Ranking

Fine ranking combines multiple models:

XFTRL model targeting CTR, using context, user, product, and video ID features plus cross features.

CTR‑oriented GBDT with enriched scene‑specific statistics.

Watch‑time‑oriented GBDT with additional recent watch‑time statistics per author/category.

The combined score yields noticeable CTR and watch‑time gains across both waterfall and immersive pages.

5. Business Strategies

5.1 Experience Optimization

Three dimensions—concentration, similarity, and discovery—are monitored. Page‑internal rematch ensures diverse virtual categories, leaf categories, and tags, improving click‑through while slightly reducing PV. Page‑level suppression filters out rarely clicked categories/tags, enhancing discovery. Real‑time blacklists filter low‑quality videos, and purchase‑based filters remove videos related to already purchased items, further diversifying the feed.

6. Conclusion

Video recommendation still has ample room for growth compared to product recommendation. Ongoing efforts include multi‑tag classification, multimodal video embeddings from image and audio, and modeling real‑time exploration interest. Interested readers are encouraged to engage and provide feedback.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Alibabamachine learningranking algorithmsshort videovideo recommendationreal-time interest
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.