Turning Ad Click Sequences into Age & Gender Predictions with Transformers

This article shares a competition winner's step‑by‑step solution for predicting user age and gender from ad click sequences, treating IDs as words, using word2vec embeddings, a custom transformer‑LSTM model, dual‑task loss, and weight‑search post‑processing.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
Turning Ad Click Sequences into Age & Gender Predictions with Transformers

Problem Overview

The Tencent Advertising Algorithm Competition asks participants to predict a user's age and gender based on the sequence of ads they click. The author, a high‑scoring contestant, treats each ad ID as a token, turning the task into a text‑classification problem under a privacy‑preserving setting.

Solution Idea

All IDs are concatenated into a sentence, enabling the use of natural‑language techniques. Word embeddings (e.g., word2vec skip‑gram) are employed, with careful tuning of the window size for large datasets.

Model Architecture

The final architecture consists of five input features followed by a single‑layer transformer, then an LSTM, and finally a dual‑task output head for age and gender. BERT was considered but discarded because the custom ID vocabulary exceeds 3 million tokens, making pre‑training prohibitively expensive and yielding poorer results in experiments.

Implementation Details

Key implementation points include:

Freezing the embedding layer due to the massive vocabulary.

Feeding the click‑time sequence as an attention mask to the transformer.

Using HuggingFace's transformer modules directly.

Relevant code snippets:

from transformers.modeling_bert import BertConfig, BertEncoder, BertAttention, BertSelfAttention, BertLayer, BertPooler

After the transformer, a single LSTM layer is added, followed by max‑pooling (average pooling was tested but did not improve scores). The model splits into two branches for the two prediction tasks.

Loss Function

A custom loss combines the cross‑entropy losses of both tasks equally:

def custom_loss(data1, targets1, data2, targets2):
    loss1 = nn.CrossEntropyLoss()(data1, targets1)
    loss2 = nn.CrossEntropyLoss()(data2, targets2)
    return loss1 * 0.5 + loss2 * 0.5

The weighting can be adjusted to favor one task over the other.

Post‑Processing

Since metrics like accuracy and F1 depend on decision thresholds, class‑specific weights are applied to the softmax outputs before taking argmax. A simple weight‑search algorithm iteratively adjusts class weights to maximize validation accuracy:

class_num = 10
weights = [1.0] * class_num

def search_weight(valid_y, raw_prob, init_weight=[1.0]*class_num, step=0.001):
    weight = init_weight.copy()
    f_best = accuracy_score(y_true=valid_y, y_pred=raw_prob.argmax(axis=1))
    flag_score = 0
    round_num = 1
    while flag_score != f_best:
        print("round: ", round_num)
        round_num += 1
        flag_score = f_best
        for c in range(class_num):
            for n_w in range(0, 2000, 10):
                num = n_w * step
                new_weight = weight.copy()
                new_weight[c] = num
                prob_df = raw_prob.copy() * np.array(new_weight)
                f = accuracy_score(y_true=valid_y, y_pred=prob_df.argmax(axis=1))
                if f > f_best:
                    weight = new_weight.copy()
                    f_best = f
                    print(f)
    return weight

This search can be extended to explore combinations of transformer, LSTM, and CNN feature extractors.

Conclusion

The presented pipeline—ID‑as‑word tokenization, word2vec embeddings, a lightweight transformer‑LSTM model, dual‑task loss, and weight‑search post‑processing—achieved a top‑5 score (1.4516) in the competition. The author encourages further experimentation with CNNs and other feature‑combination strategies.

AdvertisingTransformerNLPCompetitiontext classificationage predictiongender prediction
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.