Winning Kaggle’s Avito Demand Prediction with Multi‑Source Neural Nets and Transfer Learning
This article breaks down the Avito demand‑prediction Kaggle competition, detailing the data mix of text, images, structured fields and time series, the layered feature‑engineering tactics, multi‑source heterogeneous neural network designs, and a transfer‑learning trick that propelled the top solutions.
Competition Overview
The Avito demand‑prediction challenge on Kaggle asks participants to estimate the probability that a listed second‑hand item will sell, a problem akin to pricing assistance for sellers on platforms like Craigslist. The competition uniquely combines textual descriptions, images, structured attributes, and time‑series data.
Data Description
It is the first Kaggle contest to feature all four data modalities simultaneously. In practice, structured features (especially the strong image_top1 category label) provide the largest lift, followed by text and finally raw images.
Feature‑Engineering Strategies
Effective feature engineering remains the most impactful step. Common tactics include grouping statistics (min, max, variance) per category, leveraging linear models (LR) or decision trees to surface feature importance, and iteratively refining features based on model feedback.
The author categorises skill levels into five tiers:
Bronze – copying kernels.
Silver – rapid feature generation.
Gold – iterative addition and subtraction of features guided by model feedback.
Diamond – understanding how average category price versus listed price influences sales.
Champion – applying a transfer‑learning “black‑tech” to model the optimal price deviation.
Example of a simple price‑difference feature:
data['price'] - data.groupby(['XXX'])['price'].mean()Multi‑Source Heterogeneous Neural Network
The core idea is to concatenate embeddings from different modalities—images, text, IDs, and engineered features—into a unified network. Although deep learning lacks strong interpretability, it can serve as an automated feature extractor when combined with classic engineered features.
Typical image handling approaches:
Fine‑tune a ResNet on the raw images (high computational cost, limited gain).
Use a pre‑trained model to extract intermediate features or the final pooling layer.
Treat pre‑trained classification outputs as categorical embeddings.
In this competition, the second approach yielded the best results. Additional image meta‑features such as dimensions, brightness, blur, and key‑point coordinates can also be fed into the model.
For text, common pipelines include:
Parallel text classification models: Embedding (fastText, GloVe, word2vec) + LSTM/GRU/CNN.
Bag‑of‑words or TF‑IDF matrices with dimensionality reduction (LDA, SVD, PCA).
Text meta‑features (length, lexical richness, POS statistics) are also useful.
ID features are treated similarly to CTR modeling: embeddings, auto‑encoders, or direct engineered features. The author notes that applying an element‑wise multiplication between category embeddings (inspired by DeepFM) captures second‑order interactions effectively.
Time‑series data is left to feature engineering, as the author lacks a strong time‑series model for this task.
Transfer Learning “Black‑Tech”
To address distribution shift between the main training set and auxiliary data (as seen in the IJCAI‑18 competition), the author trained an auxiliary model on a secondary dataset to predict log(price). The residual between actual and predicted log‑price becomes a powerful feature for the main model.
I’ve designed three neural‑network models trained on active data to predict log1p(price), renewed, and log1p(total_duration). These models have two RNN branches for title and description and also use category embeddings. The difference between actual log1p(price) and the predicted value is an extremely important feature.
Key Takeaways and References
The top solutions combined extensive feature engineering, multi‑modal neural networks, and transfer‑learning tricks. The author recommends reviewing the first‑place discussion for detailed visualizations and code:
https://www.kaggle.com/c/avito-demand-prediction/discussion/59880
Other notable approaches include unsupervised learning on categorical data (second place) and a StackNet‑based ensemble with LightGBM features and cross‑entropy optimization (third place):
https://www.kaggle.com/c/avito-demand-prediction/discussion/59871
https://www.kaggle.com/c/avito-demand-prediction/discussion/59885
Overall, mastering feature engineering, effectively merging heterogeneous data in a neural network, and leveraging transfer learning are essential for high‑ranking performance in complex, multi‑modal Kaggle competitions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
