Artificial Intelligence 9 min read

Why GBDT Often Beats Neural Networks in Kaggle Competitions – An Analytical Deep Dive

This article analyzes why gradient‑boosted decision trees frequently outperform neural networks in many Kaggle contests, examining data characteristics, model strengths and weaknesses, real competition examples, and practical guidelines for choosing the right model based on nonlinearity and interpretability.

Baobao Algorithm Notes

Dec 17, 2021

Why GBDT Often Beats Neural Networks in Kaggle Competitions – An Analytical Deep Dive

Background

The discussion originates from a Zhihu question about why gradient‑boosted decision trees (GBDT) often achieve better results than neural networks (NN) on many competition datasets.

Target Nonlinearity

Nonlinearity is defined as the gap between raw input features and the decision target. Reducing this gap corresponds to constructing effective feature representations.

High‑Nonlinearity Scenarios

Sequence modeling

Large‑scale discrete ID modeling

Ambiguous language (e.g., sarcasm, nuanced sentiment)

Speech feature extraction

These tasks exceed the capacity of manual feature engineering to close the input‑target gap.

Why GBDT Often Outperforms NN on Industrial Tabular Data

Robust to outliers and missing values; no need for normalization.

Greedy splitting combined with subsampling captures moderate nonlinearity while controlling over‑fitting.

High interpretability via feature‑importance scores, which guides iterative feature engineering.

Strengths and Weaknesses

GBDT

Robust to anomalies and missing values.

Handles moderate nonlinearity efficiently.

Provides clear feature importance for engineering.

Limited capacity for very high‑dimensional dense data (e.g., raw images, text).

Neural Networks

Fully automatic feature learning.

Very high model capacity; performance scales with data volume.

Sensitive to outliers; requires careful preprocessing.

Low interpretability; debugging resembles Monte Carlo search.

Typical Kaggle Dataset Characteristics (pre‑2019)

Industrial‑scale tabular data with many categorical and numeric columns.

Data are often dirty: outliers, missing values, legacy artifacts.

Each column has clear business meaning, enabling strong interpretability.

Model Selection Guidelines

Consider four factors when choosing a model:

Data volume (small vs. large).

Degree of nonlinearity between features and target.

Interpretability of individual columns.

Potential ceiling of feature‑engineering effort.

GBDT/LGB/CTB excel when interpretability and feature‑engineering limits dominate; NN excels when data are abundant and highly nonlinear.

Empirical Evidence

Competitions where NN Wins

Google Brain – Ventilator Pressure Prediction : Medical tabular data; a Transformer model combined with engineered features achieved top‑rank scores.

# Example (TensorFlow) – Transformer model
import tensorflow as tf
model = tf.keras.Sequential([...])
# training code omitted for brevity

Riiid Answer Correctness Prediction : Student interaction logs; Transformer + feature engineering outperformed other approaches.

# Example (PyTorch) – SAKT variant
import torch
model = SAKT(...)
# training loop omitted

Competitions where GBDT Wins

IEEE‑CIS Fraud Detection (Kaggle)

Elo Merchant Category Recommendation

Home Credit Default Risk

In these tabular challenges, XGBoost/LGBM/CTB consistently ranked at the top due to their robustness and interpretability.

Conclusion

Model choice should be driven by data characteristics rather than a default preference for deep learning. Leveraging the strengths of GBDT for dirty, interpretable tabular data and the strengths of NN for large, dense, highly nonlinear data yields the best competitive performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GBDT machine learning model selection Kaggle

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.