Artificial Intelligence 7 min read

Weekly Champion Insights from the Tencent Social Ads Algorithm Competition – The ThreeIdiots Team

The ThreeIdiots team shares their experience winning the weekly champion in Tencent's social ads algorithm contest, detailing their feature engineering strategy, time‑based data splitting, handling of large‑scale data, and model choices such as LightGBM and FM, while emphasizing the importance of thoughtful feature extraction over extensive parameter tuning.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
Weekly Champion Insights from the Tencent Social Ads Algorithm Competition – The ThreeIdiots Team

We are the ThreeIdiots team, consisting of Charles and Wepon from Peking University and wsss from USTC. We clarify that we are not the Kaggle 3Idiots team; our name was chosen out of respect and humor, and we apologize for any confusion.

Our success this week stems from a feature‑centric approach: we believe that features set the upper bound of model performance. Understanding the user’s context, conversion drivers (primary demand, ad presentation, and constraints such as network speed), and extracting relevant features from real business logic are crucial.

For data splitting we adopted a time‑based method rather than random splitting to avoid leakage, ensuring that online and offline data remain synchronized, especially given the strong temporal characteristics of the dataset.

Feature engineering consumed most of our effort. We recommend focusing on robust feature extraction rather than excessive parameter tuning, particularly for tree‑based models where parameter impact is limited. Besides common ID conversion‑rate features, we experimented with word embeddings from the user_installapp table and generated dense features using a Wide & Deep model.

Handling the large data scale presented memory and speed challenges. We suggest down‑sampling, applying feature selection (Filter, Wrapper, Embedded methods), and using models like ensemble trees or Lasso that inherently perform selection. Incremental feature extraction—saving intermediate features to disk and concatenating later—helps manage resources.

In the preliminary round we used XGBoost, which outperformed LightGBM, but for the final round we switched to LightGBM and FM due to faster training on the larger dataset. We have not yet performed model fusion but plan to explore it later; however, we advise prioritizing strong features before attempting complex ensembles.

In summary, observe the data carefully, invest time in thoughtful feature design, minimize unnecessary coding and parameter tweaking, and iterate quickly. We wish everyone success in the competition.

Machine Learningfeature engineeringmodel selectionTencentalgorithm competitiondata-splitting
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.