How Likee Scales Short‑Video Recommendations with Flink, Auto‑Stats, and Cache Tensor
This article details Likee's short‑video recommendation pipeline, covering the evolution of its feature‑engineering framework, the use of Flink for minute‑level statistical and second‑level session features, the integration of automatic statistical features into DNN models, multimodal feature extraction, and the cache‑tensor technique that dramatically improves online inference performance.
Background and Motivation
Likee, the short‑video product of BIGO, launched in July 2017 and quickly reached over 100 million monthly active users worldwide. The homepage waterfall feed is the primary consumption scenario, and its recommendation system has continuously evolved both in model architecture and feature engineering to improve relevance and business metrics.
Feature Engineering Foundations
Feature engineering transforms raw data into model‑readable inputs, removing noise and redundancy while enhancing predictive power. In large‑scale real‑time deep models, an efficient feature data framework is crucial for rapid strategy iteration and reduced engineering complexity.
Flink‑Based Statistical and Session Features
The recommendation pipeline consists of log collection, training data generation, model training, and online scoring. Initially, an XGBoost model using continuous statistical features was deployed, but its limitations in multi‑objective modeling and incremental learning prompted a shift to deep learning.
To support minute‑level statistical features and second‑level session features, Flink is employed with a variable‑length bucket window aggregation:
Long‑term behaviors are aggregated in larger time buckets, while recent actions use finer granularity.
This reduces Redis and Flink memory pressure while preserving counting accuracy.
Session features are built by maintaining per‑user behavior queues in Redis, with engineering optimizations such as:
Per‑behavior queue storage : Separate queues for high‑value actions (like, follow, share) to retain longer histories.
Hot‑cold data separation : Older behavior queues are offloaded to Pika, extending queue length to thousands of events.
User behavior feedback encoding : Actions like completion, watch time, and interaction types are embedded into the model.
Real‑time video attribute retrieval : Video IDs are used to fetch attribute features from a profile service during inference.
Automatic Statistical Features
Automatic statistical features embed counting statistics directly into the model parameters during training, eliminating the need for separate offline feature pipelines. For each hash key observed in a sample:
Stats_feature <- stats_feature * decay_rate + actiontplwhere decay_rate is a time‑decay factor and actiontpl encodes user feedback (exposure, click, watch‑through, etc.). The updated stats_feature is pushed back to the parameter server.
These stats are concatenated across feature domains, transformed into rate features (e.g., click‑through rate, completion rate), and combined with embedding features. Stats features provide rapid convergence for sparse hash keys, while embeddings capture richer patterns for frequent keys.
Multimodal Feature Extraction
To further boost model performance, raw video and audio signals are extracted using a pre‑trained I3D video model and a speech model. The final fully‑connected layer outputs low‑dimensional vectors stored as video intrinsic attributes. Because the raw multimodal vectors are high‑dimensional, they are clustered with K‑MEANS, and the cluster IDs are embedded via a DNN to produce compact representations that are fine‑tuned through ranking loss.
Cache Tensor Architecture
To reduce I/O and storage pressure from massive profile features, a cache‑tensor approach moves stable user and video profile features into the model parameters, leaving only fast‑changing session features to be fetched online. The data flow is split into:
Offline predict network : Generates and caches tensors (user/item embeddings) during training.
Online predict network : Uses cached tensors together with session features for inference, eliminating the need to read large profile tables.
This redesign yields a 6× increase in online throughput, a 40% reduction in latency, and a 2% AUC lift for the coarse‑ranking model.
Conclusion
Likee's recommendation system improvements include:
Flink‑driven second‑level session features with queue‑based storage and hot‑cold separation.
Automatic statistical features that embed counting updates into DNN training, simplifying the feature pipeline.
Cache‑tensor techniques that embed stable profile features, drastically cutting storage I/O and inference cost.
These engineering advances collectively enhance model stability, reduce system complexity, and deliver significant business metric gains.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
