2026 Big Data Challenge Announces Monthly Star Winners and Shares Winning Teams’ Insights
The 2026 China University Computer Competition – Big Data Challenge reveals the Monthly Star award winners, each receiving 800 RMB, and presents detailed experience reports from the top teams covering feature engineering, model selection, training validation, and ensemble strategies for stock prediction.
The Phase A online round of the 2026 China University Computer Competition – Big Data Challenge announced the Monthly Star award winners, listing the student teams that earned the prize and noting that each winning team receives an 800 RMB cash award.
The two best‑performing teams shared their practical experience. Their feature engineering combined basic time‑series attributes such as moving averages, returns and volatility over 3, 5, 10, 20, 40‑day windows, and cross‑sectional features like daily rank, excess return versus market mean, market sentiment, and overall volatility. No external data beyond the provided training and test sets were used.
For modeling, they built baselines with LightGBM, HistGradientBoosting and Random Forest, noting that LightGBM Ranker aligns well with the ranking objective. Model fusion was central: multiple base models were trained, their out‑of‑fold predictions fed into a second‑level model to reduce variance. Two auxiliary models predicted the daily top‑1 stock and short‑term explosive candidates, and return prediction and up‑down probability were modeled separately.
Training and validation followed a time‑series cross‑validation scheme: four folds with a five‑day gap between folds to avoid data leakage, each validation set covering 20 days and the training set at least 120 days. The random seed was fixed at 20260416 for reproducibility. Hyper‑parameter tuning was minimal, mainly adjusting the number of trees while keeping other settings at defaults.
The ensemble strategy employed a multi‑stage stock selection pipeline: generate a candidate pool, perform refined ranking, and apply a risk penalty. A custom label, target_precision_gate, required positive return, top‑25 % daily rank, stable short‑ and mid‑term performance, and a maximum drawdown not exceeding 3 %. Position sizing was not forced to be full; exposure was reduced or set to cash when market conditions were unfavorable. LightGBM used roughly 200–300 trees, HistGradientBoosting about 300 rounds.
Another winning team (milky‑frog) emphasized data understanding: they highlighted the strong temporal and cross‑sectional nature of stock data, careful handling of missing and abnormal values, and the importance of a time‑series split for validation rather than random splitting. Their feature engineering favored stable, interpretable features over overly complex ones, and they preferred robust models to deep learning architectures, noting pitfalls such as over‑optimizing for a specific period that caused severe drawdowns elsewhere.
The competition has attracted over 1,500 teams and more than 2,300 participants. Participants can download the dataset for local development and submit result files and model code through the competition platform. The registration deadline is July 15 12:00, and official channels include the competition website, email address, and several QQ groups.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
