Artificial Intelligence 7 min read

Tencent Advertising Algorithm Competition: Embedding Team's Transformer-Based Solution and Tips

The article details the Tencent Advertising Algorithm Competition semifinal, presenting the Embedding team's pure Transformer model with joint ID word embeddings, data preprocessing, training hyperparameters, model architecture, and practical tips for improving age/gender prediction from click sequences.

Tencent Advertising Technology

Jul 30, 2020

Tencent Advertising Algorithm Competition: Embedding Team's Transformer-Based Solution and Tips

The competition's semifinal task was user profiling: predicting age and gender from sequences of ad clicks.

Our team used a pure Transformer architecture and trained joint-ID word embeddings. On the A leaderboard, a Skip‑Gram word vector model with six inputs and 20‑fold cross‑validation scored 1.474; adding the joint‑ID embeddings raised the score to 1.476, and on the B leaderboard the score was 1.478.

We summarised the following tips:

- Use a small learning rate for the Transformer and decrease it as epochs increase.

- Concatenate intermediate layer outputs and feed them to the final fully‑connected layer.

- Generate word vectors for the multiple attribute IDs linked to creative_id via joint learning to boost performance.

- Significant gains come from ensembling multiple folds of the Transformer.

Data preparation steps:

- In ad.csv, replace \N in product_id and industry with 0, then add 1 to all values.

- Join ad.csv and click_log.csv on creative_id, merge preliminary and semifinal datasets, and sort by user_id and time.

- Save HDF files: train_click_log, test_click_log, train_user.

- Generate train/test JSONL datasets per user: {'user_id': user_id, 'labels': [age, gender], 'time': [], 'creative_id': [], 'click_times': [], 'ad_id': [], 'advertiser_id': [], 'industry': []}.

- Create Skip‑Gram word‑vector training data from the sequences: creative_id, ad_id, product_id, product_category, advertiser_id, industry.

- Create joint‑ID word‑vector training data by removing user_id, labels, time, and click_times from the JSONL.

Word‑vector training:

- Skip‑Gram vectors:

gensim.models.Word2Vec(sentences, size=128, window=20, min_count=1, workers=24, iter=10, sg=1, negative=20, sample=1e-3)。

- Joint‑ID vectors: follow the KDD18 paper https://arxiv.org/pdf/1712.08289.pdf . Generate 128‑dim vectors for creative_id in two stages: first stage lr=0.004 for 5 epochs, second stage lr=0.001 for 3 epochs.

Transformer model:

- Features: six ID‑based features – creative_id, ad_id, product_id, advertiser_id, industry, product_category. creative_id and ad_id are already converted to embedding features; creative_id yields joint_in_creative_emb and creative_id_emb; ad_id yields ad_id_emb. The other four IDs use pretrained PyTorch embeddings, and the embedding layer is trained.

- Model structure: each ID is embedded, then passed through a sparse SinkhornTransformer encoder, producing seven tensors of shape BatchSize × MaxLength × EmbbedingSize. These are concatenated on the embedding dimension and processed by two additional SinkhornTransformer encoders. Max‑pooling is applied to the embedding layer and the three SinkhornTransformer layers, yielding a BatchSize × (EmbbedingSize × 28) matrix, which feeds a binary and a dec‑class fully‑connected layer.

- Hyperparameters: embedding size 128, max sequence length 128, batch size 128, epochs 3, learning rates per epoch: 1e‑4, 6e‑5, 1e‑5. Model checkpoints every 10000, 8000, 4000 steps per epoch; for 20‑fold cross‑validation the 48000‑step checkpoint can be used as the final model.

Multi‑fold fusion:

Predict with each of the 20 models, apply softmax to the class outputs, sum the 20 probability vectors, take the index of the maximum sum, add one to obtain the final submission.

Thanks to the Embedding team for sharing their insights; in intense competitions, such tricks are often the key to victory, and we hope all finalists can find their own strengths and perform at their best.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning Competition Solution Tencent Advertising Word Embedding

Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.