Insights on Data Preprocessing, Modeling, and Mindset from a Tencent Advertising Algorithm Competition Participant
A participant from Harbin Institute of Technology shares practical data‑preprocessing tricks, model choices, useful feature ideas, and a resilient mindset gained while competing in the Tencent Advertising Algorithm Contest, offering tips that can help other data scientists handle large‑scale ad data.
Hello everyone, I am pipi from Harbin Institute of Technology. I’m honored to receive the "Best Progress" award this week in the Tencent Advertising Algorithm Competition and would like to share my experience, including data preprocessing methods, useful strong features, and my competition mindset.
1. Data Preprocessing
The preliminary dataset is as large as last year’s final set, which quickly exhausted my 16 GB lab machine. I upgraded to 32 GB and adopted a streaming‑processing approach: load the matrix row by row, concatenate and write back incrementally, or column by column when possible. For multi‑value features like kw1 , I built a dictionary of unique values and processed them row‑wise. I also recommend saving intermediate results in compressed formats such as .npz . Using this method, my preprocessing stays under 20 GB of RAM and finishes the final‑stage data in about two hours.
2. Model Algorithm
In the preliminary round I relied on LightGBM to extract statistical and conversion‑rate features, which barely qualified me for the finals. For the final stage I switched to a neural network because of the memory constraints that make tree‑based models impractical on my machine.
3. Strong Features
Besides the community‑shared strong features, I found two that work well for me: combined features such as (creativesize + gender) and (advertised + LBS). Feel free to skip them if you already have better ones.
4. Mindset
After joining the competition in late April, I spent about ten days getting into the rhythm. Progress can be slow—sometimes I didn’t gain any points for five or six days—but it’s important to stay healthy and keep the competition to the end. Whether you finish with a top rank or a modest score, completing the challenge is a success.
5. Acknowledgements
Thanks to fellow contestants Bryan, Leona, WL, Zha Da, Guo Da, Jiarenyf, Ge Wenqiang, and YouChouNoBB for their open‑source contributions and shared code, some of which I still use.
Upcoming: The finals will be held on July 29‑30, 2018. Stay tuned for more competition updates!
Tencent Advertising Technology
Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.