Artificial Intelligence 13 min read

Tencent Advertising Algorithm Competition 2020: Problem Overview, Data, Model Implementation, and Results

This article details the 2020 Tencent Advertising Algorithm Competition, describing the user profiling task, data fields, feature engineering, Python code for ID mapping and Word2Vec training, multiple model architectures (LSTM, CNN-Inception, transformer), and the final performance results achieved by the team.

Tencent Advertising Technology

Jul 30, 2020

Tencent Advertising Algorithm Competition 2020: Problem Overview, Data, Model Implementation, and Results

The 2020 Tencent Advertising Algorithm Competition final was held on August 3 at Tencent Binhai Building in Shenzhen, featuring a live broadcast and a sharing session by the internal second‑place team.

The competition task required predicting users' age and gender based on their ad click behavior and associated ad information, essentially a user‑profiling problem.

Provided data fields included time (day granularity), user_id, creative_id, click_times, ad_id, product_id, product_category, advertiser_id, industry, as well as the target labels age (1‑10) and gender (1 or 2).

For model input the team selected five ID sequences—creative_id, ad_id, advertiser_id, product_id, and industry—and applied ID mapping to unify rare and unseen IDs, as shown in the following code:

differ = set(train[col].unique()).symmetric_difference(set(test[col].unique()))
common = set(train[col].unique()) and (set(test[col].unique()))
for v in val_cnt[val_cnt == 1].index:
    id_map[v] = 0
for v in differ:
    id_map[v] = 0
for i, v in enumerate(common):
    id_map[v] = i + 1

Word2Vec embeddings were trained using the skip‑gram model with parameters min_count=1, size=256, window=10, and the resulting vectors were loaded for each selected ID column.

model = models.Word2Vec(list_d, sg=1, min_count=1, size=256, window=10, workers=48, iter=10)
We = []
if '0' in model.wv:
    for i in tqdm(range(len(model.wv.index2word))):
        We.append(model.wv[str(i)].reshape((1,-1)))
else:
    We.append(np.zeros((1,128)))
We = np.vstack(We)

Input construction involved various ordering strategies (forward, reverse, random shuffle) and duplication of records according to click_times, with sequence length capped at 95% of the maximum length.

The team experimented with several model architectures: LSTM, CNN‑Inception, and a transformer‑LSTM hybrid. The final LSTM implementation in Keras and PyTorch, as well as the CNN‑Inception model in PyTorch, are provided below.

## LSTM keras‑version
def LSTM(config, n_cls=10):
    cols = ['creative_id', 'ad_id', 'advertiser_id', 'product_id', 'industry']
    n_in = len(cols)
    inputs, outputs, max_len = [], [], []
    for i in range(n_in):
        We = np.load(f'./w2v_256_10/{cols[i]}_embedding_weight.npy')
        We = np.vstack([We, np.zeros(config.embeddingSize)])
        inp = Input(shape=(config.sequenceLength,), dtype="int32")
        x = Embedding(We.shape[0], We.shape[1], weights=[We], trainable=False)(inp)
        inputs.append(inp)
        outputs.append(x)
        del We
        gc.collect()
    embedding_model = Model(inputs, outputs)
    # ... (rest of the Keras LSTM definition) ...
    return model, lstm_model

class Inception(nn.Module):
    def __init__(self, cin, co, relu=True, norm=True):
        super(Inception, self).__init__()
        assert(co%4==0)
        cos=[co//4]*4
        self.activa=nn.Sequential()
        if norm: self.activa.add_module('norm', nn.BatchNorm1d(co))
        if relu: self.activa.add_module('relu', nn.ReLU(True))
        self.branch1 = nn.Sequential(OrderedDict([('conv1', nn.Conv1d(cin, cos[0], 1, stride=1)]))
        # ... (other branches) ...
    def forward(self, x):
        branch1=self.branch1(x)
        # ... (concatenate branches) ...
        return self.activa(torch.cat((branch1,branch2,branch3,branch4),1))

Performance results showed the best single model achieving approximately 1.475 on the semi‑final A leaderboard, with an ensemble reaching about 1.48, placing second on the internal leaderboard and fourteenth on the external leaderboard.

CNN Advertising Python user profiling Competition LSTM

Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.