Building an Advertising Recommendation Model with Python and PyTorch
This article walks through the development of a simple advertising recommendation system using Python, covering data collection, preprocessing with label encoding, text embedding via Torch, constructing an MLP model, and initiating training, while reflecting on the challenges faced by Python developers in the big‑data era.
Being a Python developer can feel contradictory: you are immersed in big‑data environments where everyone is "naked swimming" while you also have to analyze and block intrusive ads.
The example starts by gathering a large set of ad‑placement data, including app information, ad slot IDs, media IDs, material details, titles, descriptions, and other vector features.
To handle categorical fields such as pkgname , ver , slotid , mediaid , and material , label encoding is applied to both training and test sets:
for col in ["pkgname", "ver", "slotid", "mediaid", "material"]: lbl = LabelEncoder() lbl.fit(train_df[col].tolist() + test_df[col].tolist()) train_df[col] = lbl.transform(train_df[col]) test_df[col] = lbl.transform(test_df[col])
After encoding, textual features are transformed into vectors using Torch's Embedding layer, converting each categorical value into a dense representation based on the logarithm of its cardinality.
The core model is a multilayer perceptron (MLP) defined with PyTorch. It creates embedding dictionaries for each categorical field, concatenates them with other feature vectors, passes the result through configurable fully‑connected layers, applies ReLU and dropout, and finally outputs a logit:
class MLP(nn.Module): def __init__(self, category_dict, layers=[45 + 240, 32], dropout=False): super().__init__() self.category_dict = category_dict self.embedding_dict = { key: torch.nn.Embedding(self.category_dict[key] + 1, int(np.log2(self.category_dict[key]))) for key in category_dict.keys() } self.fc_layers = torch.nn.ModuleList() for _, (in_size, out_size) in enumerate(zip(layers[:-1], layers[1:])): self.fc_layers.append(torch.nn.Linear(in_size, out_size).to(device)) self.output_layer = torch.nn.Linear(layers[-1], 1).to(device) def forward(self, feed_dict, embed_dict): embedding_feet = {key: self.embedding_dict[key](feed_dict[key]) for key in self.category_dict.keys()} x = torch.cat(list(embedding_feet.values()), 1) x = torch.cat([x, embed_dict], 1) for idx, _ in enumerate(range(len(self.fc_layers))): x = self.fc_layers[idx](x) x = F.relu(x) x = F.dropout(x) logit = self.output_layer(x) return logit
Training is then launched ("Training starts~"), demonstrating a typical workflow for an ad recommendation pipeline, while acknowledging that more sophisticated techniques may be needed for production‑grade systems.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.