Why GBDT Often Beats Neural Networks in Kaggle Competitions – An Analytical Deep Dive
This article analyzes why gradient‑boosted decision trees frequently outperform neural networks in many Kaggle contests, examining data characteristics, model strengths and weaknesses, real competition examples, and practical guidelines for choosing the right model based on nonlinearity and interpretability.
Background
The discussion originates from a Zhihu question about why gradient‑boosted decision trees (GBDT) often achieve better results than neural networks (NN) on many competition datasets.
Target Nonlinearity
Nonlinearity is defined as the gap between raw input features and the decision target. Reducing this gap corresponds to constructing effective feature representations.
High‑Nonlinearity Scenarios
Sequence modeling
Large‑scale discrete ID modeling
Ambiguous language (e.g., sarcasm, nuanced sentiment)
Speech feature extraction
These tasks exceed the capacity of manual feature engineering to close the input‑target gap.
Why GBDT Often Outperforms NN on Industrial Tabular Data
Robust to outliers and missing values; no need for normalization.
Greedy splitting combined with subsampling captures moderate nonlinearity while controlling over‑fitting.
High interpretability via feature‑importance scores, which guides iterative feature engineering.
Strengths and Weaknesses
GBDT
Robust to anomalies and missing values.
Handles moderate nonlinearity efficiently.
Provides clear feature importance for engineering.
Limited capacity for very high‑dimensional dense data (e.g., raw images, text).
Neural Networks
Fully automatic feature learning.
Very high model capacity; performance scales with data volume.
Sensitive to outliers; requires careful preprocessing.
Low interpretability; debugging resembles Monte Carlo search.
Typical Kaggle Dataset Characteristics (pre‑2019)
Industrial‑scale tabular data with many categorical and numeric columns.
Data are often dirty: outliers, missing values, legacy artifacts.
Each column has clear business meaning, enabling strong interpretability.
Model Selection Guidelines
Consider four factors when choosing a model:
Data volume (small vs. large).
Degree of nonlinearity between features and target.
Interpretability of individual columns.
Potential ceiling of feature‑engineering effort.
GBDT/LGB/CTB excel when interpretability and feature‑engineering limits dominate; NN excels when data are abundant and highly nonlinear.
Empirical Evidence
Competitions where NN Wins
Google Brain – Ventilator Pressure Prediction : Medical tabular data; a Transformer model combined with engineered features achieved top‑rank scores.
# Example (TensorFlow) – Transformer model
import tensorflow as tf
model = tf.keras.Sequential([...])
# training code omitted for brevityRiiid Answer Correctness Prediction : Student interaction logs; Transformer + feature engineering outperformed other approaches.
# Example (PyTorch) – SAKT variant
import torch
model = SAKT(...)
# training loop omittedCompetitions where GBDT Wins
IEEE‑CIS Fraud Detection (Kaggle)
Elo Merchant Category Recommendation
Home Credit Default Risk
In these tabular challenges, XGBoost/LGBM/CTB consistently ranked at the top due to their robustness and interpretability.
Conclusion
Model choice should be driven by data characteristics rather than a default preference for deep learning. Leveraging the strengths of GBDT for dirty, interpretable tabular data and the strengths of NN for large, dense, highly nonlinear data yields the best competitive performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
