Artificial Intelligence 32 min read

Feature Engineering Practices for Short‑Video Recommendation Systems

Effective short‑video recommendation relies on meticulous feature engineering that transforms raw signals—numerical counts, categorical IDs, content and user embeddings, context and session data—through bucketization, scaling, crossing, and smoothing, then selects and evaluates them via filtering, wrapping, regularization, and importance analysis to mitigate business biases and improve multi‑objective ranking performance.

Tencent Cloud Developer

Dec 3, 2019

Feature Engineering Practices for Short‑Video Recommendation Systems

In recommendation systems, feature engineering plays a crucial role. Data and features determine the upper bound of machine‑learning performance, while model and algorithm choices only approach that limit. Effective feature engineering requires abundant data, knowledge extraction, and transformation of raw signals into model‑ready representations.

This article shares general methods and practical experiences of feature engineering on Weishi, a short‑video platform. The platform’s unique business characteristics—such as hot categories that naturally achieve higher playtime and interaction rates, and the disadvantage of long videos in completion metrics—introduce biases that must be carefully handled when constructing features for multi‑objective ranking models.

1. Feature Extraction

Numerical features (e.g., play counts, likes, shares) are often transformed before feeding into models. Common techniques include:

Bucketization (equal‑width, equal‑frequency, model‑driven clustering) to discretize continuous values and improve sparsity and robustness.

Truncation or precision reduction, sometimes after logarithmic scaling, to treat long‑tail values as categorical.

Missing‑value handling: imputation (mean, median, mode), encoding missing as a separate category, or using models that natively support missing values (e.g., XGBoost).

Feature crossing (inner product, Cartesian product, Hadamard product) to capture non‑linear interactions, with attention to dimensionality explosion.

Normalization and scaling (min‑max, z‑score, non‑linear transforms such as log) to make features comparable and accelerate gradient‑based training.

Data smoothing (Bayesian smoothing, Wilson interval smoothing) to mitigate bias from sparse or noisy statistics, especially for ratio‑type features like click‑through rate.

Bias elimination by adjusting statistics across time windows or video length groups.

Category features (e.g., video ID, author, tags, resolution) are processed via:

One‑hot encoding for low‑cardinality attributes.

Feature hashing to compress high‑cardinality categorical spaces.

Ranking‑based encoding (e.g., top‑N interests) to preserve order information.

Anomaly handling through discretization or embedding.

Crossing categorical with categorical or numerical features to create richer signals.

Embedding features include:

Video embeddings derived from content (title, cover, audio) using NLP/vision models, or from user‑video interaction sequences via skip‑gram models.

User embeddings built from recent click sequences, weighted averages, or deep models (e.g., softmax‑based DNN, heterogeneous graph neural networks).

Author embeddings obtained by averaging recent video embeddings of the author.

Context features capture client‑side information such as request time, device model, OS, network type, and channel.

Session features are constructed from recent user behavior windows (fixed count, fixed time span, or continuous session) and may include raw item ID sequences or aggregated statistics.

2. Feature Selection

Three families of methods are described:

Filtering : coverage, variance analysis, Pearson correlation, hypothesis testing, mutual information.

Wrapping : evaluate subsets with full model training (exhaustive, greedy, random search).

Embedding : regularization‑based (L1), tree‑model importance (frequency of splits), etc.

3. Feature Importance Analysis

Single‑feature AUC/G‑AUC ranking.

Feature value zeroing, random replacement, or permutation to measure impact on model performance.

Conclusion

Feature engineering remains tightly coupled with both the business logic and the chosen model. Understanding data distributions, bias sources, and model characteristics is essential for designing effective features. While complex models can reduce some engineering effort, they do not eliminate the need for thoughtful feature construction.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning feature engineering recommendation system Embedding Data preprocessing short video bias mitigation

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.