From Data Chaos to Predictive Insight: My Solo Journey in the 2025 Big Data Competition
An individual participant recounts their journey in the 2025 China University Computer Competition Big Data Challenge, detailing data cleaning, feature engineering, model building on 300‑stock historical prices, and insights gained from solo competition experience, highlighting challenges, lessons, and future directions in financial AI.
The 2025 China University Computer Competition featured a Big Data Challenge centered on financial data. Participants were required to use historical price data of the Shanghai‑Shenzhen 300 index components to build machine learning models that predict the next trading day's largest and smallest price movements among ten selected stocks.
As a first‑time competitor, I joined the team "抹香鲸cmr2" led by Zhou Xiheng from China University of Petroleum (East). Our team achieved a national ranking of seventh. Throughout the preparation, I started from scratch to design a data processing pipeline, learning how to clean high‑frequency financial time‑series data and perform extensive feature engineering.
Key features I constructed included price change rates, volume variations, volatility measures, and momentum indicators, forming a multi‑dimensional feature set. I experimented with various supervised and unsupervised learning models, focusing on model stability and robustness during the tuning phase. Efforts were made to avoid over‑fitting while improving the ability to identify extreme price swings.
Competing as a solo participant added isolation and difficulty. Every step—from data exploration to model deployment—required independent decision‑making and repeated validation. Lacking immediate team discussions, I relied heavily on academic literature, open‑source projects, and community forums to broaden my approach. Each submission’s ranking feedback served as a crucial signal for further model refinement.
The experience highlighted the inherent complexity and uncertainty of financial forecasting. Relying solely on historical numerical data imposes natural limits, yet this uncertainty drives continuous reflection on model boundaries and improvement strategies. I learned to extract effective signals from limited information and deepened my understanding of model evaluation and risk control.
Overall, the competition significantly enhanced my data processing, modeling practice, and problem‑abstraction abilities. It also sparked a lasting interest in quantitative analysis and the application of artificial intelligence in finance. I plan to pursue more robust and interpretable prediction methods and look forward to engaging with higher‑level competitive platforms and peers.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
