Big Data 8 min read

How Big Data Transforms Petrochemical Price Forecasting: A Student Project Review

This report details a university big‑data project that built a full pipeline—from raw petrochemical market data and text mining to variable selection, XGBoost/LightGBM/Lasoo regression and RNN/LSTM/GRU models—to predict product prices across multiple horizons, evaluate errors, and deliver an interactive demo.

Data Party THU
Data Party THU
Data Party THU
How Big Data Transforms Petrochemical Price Forecasting: A Student Project Review

Facing intensified price volatility and fierce competition in the petrochemical market, traditional forecasting methods are insufficient, prompting the need for big‑data techniques to improve production planning and cost reduction.

The project, building on several years of prior work, set higher goals for target products, data sources, forecasting horizons, and demo usability. It retained the overall research workflow of data processing, variable selection, model training, and prediction, while introducing new methods to automate the end‑to‑end process.

Structured data were filtered from the enterprise’s raw database, resulting in three major categories—capacity, fundamentals, and reference prices—comprising 18 sub‑categories and over 1,100 variables.

Unstructured data became a key innovation: text from Sinopec engineering briefings was collected, tokenized, and analyzed with TF‑IDF, selecting the top 800 terms. These were combined with the structured variables, yielding four categories of more than 1,900 variables across 3,500+ dates and over 6 million raw records.

Four target product price series showed diverse, non‑periodic trends, indicating high forecasting difficulty.

Six forecasting horizons (short, medium, long) were defined. Each horizon underwent lag processing: the target variable was shifted forward for model training and shifted back after prediction to simulate future forecasts.

During variable selection, only structured variables were filtered (top 20% importance) while all text variables were retained, producing 72 distinct variable sets across different scenarios using XGBoost, LightGBM, and Lasoo regressors.

Three neural‑network models—RNN, LSTM, and GRU—were then trained on each variable set. Data were split chronologically 80% for training and 20% for testing, and average relative error measured performance.

The 72 variable sets combined with the three neural models created 216 prediction scenarios. Hyper‑parameters were kept uniform across scenarios; training each scenario took 5–10 minutes, while prediction required about 30 seconds. Training and prediction were separated into distinct steps for better user experience.

Testing results showed overall strong explanatory power, with modest error growth over longer horizons. Low‑density polyethylene and isotactic polypropylene (end‑of‑chain products) yielded lower errors, while butadiene (mid‑chain) performed worse due to complex market factors. Among regression models, XGBoost performed slightly better; among neural models, LSTM achieved the best average performance, GRU excelled on polypropylene, and RNN exhibited unstable spikes and is not recommended.

A complete demo was built, featuring automated data updates and front‑end/back‑end separation. Backend processing runs quarterly by enterprise staff, while users can trigger predictions on demand. Documentation was provided to guide further modifications.

Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Image
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datadata-processingXGBoostLSTMRNNpetrochemicalprice forecasting
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.