Kaggle Jane Street Market Prediction Competition Summary and Model Insights
This article summarizes the author's participation in the Kaggle Jane Street Market Prediction competition, detailing the anonymous feature dataset, utility‑score metric, data preprocessing, the combined AE‑MLP and XGBoost modeling approach, threshold tuning, experimental findings, and references for further study.
The author, Wang Mingjie, a research assistant at Beijing Normal University (Zhuhai), shares his experience and results from the Kaggle Jane Street Market Prediction competition, aiming to provide algorithmic ideas for fellow competitors.
The competition uses anonymous market data from major exchanges, offering 130 feature columns (feature_0…feature_129) and requiring participants to predict an action (trade or not) along with associated weight and resp values that together represent the trade’s return.
The evaluation metric is Utility Score, which rewards both high revenue (Pi) and stability (low maximum drawdown), penalizing large negative daily incomes.
To avoid over‑fitting on the public leaderboard, the team employed a 5‑fold purged group time‑series cross‑validation, removed the first 85 days from training, forward‑filled missing values, and transformed all resp targets into a multi‑label classification problem.
The final model combines an auto‑encoder‑enhanced MLP with an XGBoost (100‑round) model. Each component was trained with three different random seeds and averaged to reduce variance; the MLP uses Swish activation, batch normalization, dropout, Gaussian noise for data augmentation, and hyperparameter optimization via Hyperopt.
Key experimental observations include: using resp_3 as the target yields higher and more stable returns; a decision threshold between 0.51 and 0.52 improves profitability; and extensive feature engineering on the anonymous features provides limited offline gains but can hurt online performance.
The resulting ensemble (AE+MLP + XGBoost) achieved a TOP‑1 ranking as of July 2021, with the final submission using a 0.51 threshold for action decisions.
References for deeper exploration are provided, including links to Kaggle discussions, related notebooks, and the original auto‑encoder MLP approach.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.