Artificial Intelligence 5 min read

Insights on Data Preprocessing, Modeling, and Mindset from a Tencent Advertising Algorithm Competition Participant

A participant from Harbin Institute of Technology shares practical data‑preprocessing tricks, model choices, useful feature ideas, and a resilient mindset gained while competing in the Tencent Advertising Algorithm Contest, offering tips that can help other data scientists handle large‑scale ad data.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
Insights on Data Preprocessing, Modeling, and Mindset from a Tencent Advertising Algorithm Competition Participant

Hello everyone, I am pipi from Harbin Institute of Technology. I’m honored to receive the "Best Progress" award this week in the Tencent Advertising Algorithm Competition and would like to share my experience, including data preprocessing methods, useful strong features, and my competition mindset.

1. Data Preprocessing

The preliminary dataset is as large as last year’s final set, which quickly exhausted my 16 GB lab machine. I upgraded to 32 GB and adopted a streaming‑processing approach: load the matrix row by row, concatenate and write back incrementally, or column by column when possible. For multi‑value features like kw1 , I built a dictionary of unique values and processed them row‑wise. I also recommend saving intermediate results in compressed formats such as .npz . Using this method, my preprocessing stays under 20 GB of RAM and finishes the final‑stage data in about two hours.

2. Model Algorithm

In the preliminary round I relied on LightGBM to extract statistical and conversion‑rate features, which barely qualified me for the finals. For the final stage I switched to a neural network because of the memory constraints that make tree‑based models impractical on my machine.

3. Strong Features

Besides the community‑shared strong features, I found two that work well for me: combined features such as (creativesize + gender) and (advertised + LBS). Feel free to skip them if you already have better ones.

4. Mindset

After joining the competition in late April, I spent about ten days getting into the rhythm. Progress can be slow—sometimes I didn’t gain any points for five or six days—but it’s important to stay healthy and keep the competition to the end. Whether you finish with a top rank or a modest score, completing the challenge is a success.

5. Acknowledgements

Thanks to fellow contestants Bryan, Leona, WL, Zha Da, Guo Da, Jiarenyf, Ge Wenqiang, and YouChouNoBB for their open‑source contributions and shared code, some of which I still use.

Upcoming: The finals will be held on July 29‑30, 2018. Stay tuned for more competition updates!

data preprocessingcompetitionmindset
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.