Mastering Time Series Forecasting: From Moving Averages to Transformers
Time series forecasting, essential across weather, finance, and commerce, involves tasks like classification, clustering, anomaly detection, and especially prediction; this article explores its definitions, evaluation metrics, traditional methods, machine‑learning approaches, deep‑learning models such as TFT, and emerging AutoML tools, offering practical insights and best practices.
Time Series Problem Definition and Classification
Time series are data points ordered by time. Common tasks include classification, clustering, anomaly detection, and especially prediction. This article focuses on forecasting.
Forecasting is widely used in weather, traffic, finance, sales, medicine, and system load. A McKinsey AI study ranks time‑series problems as the second most valuable data type.
Google’s internal time‑series forecasting use cases illustrate the breadth of applications.
Evaluating Time Series Forecasts
Regression metrics such as MAE and MSE are sensitive to the scale of the target. Scale‑independent metrics like SMAPE and WMAPE are preferred because they lie in the 0‑1 range, enabling cross‑series comparison.
Time Series Forecasting Validation
Cross‑validation for time series must respect temporal order: split the data chronologically (e.g., train on Jan‑Jun, validate on Jul; then train on Feb‑Jul, validate on Aug, etc.). This mirrors real‑world deployment.
Traditional Time Series Methods
Moving Average (MA) is a strong baseline: the simple MA averages the last n observations. Weighted and exponential moving averages extend this idea. In pandas,
rollingand
ewmimplement them; SQL window functions can also be used.
ARIMA combines autoregressive (AR) and moving‑average (MA) components. Auto‑ARIMA tools (e.g.,
pmdarima) automate order selection. Variants include SARIMA, ARIMAX, ARCH, GARCH. These models require fitting each series individually, which can be costly at scale.
Prophet (Facebook) uses an additive model (trend + seasonality + holidays) and provides probabilistic forecasts. It is user‑friendly but still needs per‑series fitting.
Machine Learning Methods
Most winning Kaggle forecasting solutions rely on gradient‑boosted trees (e.g., LightGBM, XGBoost). The workflow transforms a time series into a tabular regression problem using a sliding window: historical values become features (lag features) and the target is the future value.
Key parameters include historical window size, prediction horizon (gap), and prediction window length. Feature engineering distinguishes static categorical features (e.g., product ID) from dynamic features (lag values, date‑derived attributes). Advanced tools like
tsfreshautomate feature extraction.
Deep Learning Methods
RNN/GRU/LSTM models can predict multiple steps but often require careful tuning. Seq2Seq architectures with attention were used in the Web Traffic Forecasting competition. WaveNet applies dilated causal convolutions but performed worse than RNNs in our tests.
DeepAR (Amazon) outputs probabilistic forecasts but is unstable compared to tree models. Temporal Fusion Transformers (TFT) combine attention with feature selection networks and can match or exceed GBDT performance, though training cost remains high.
<code>training = TimeSeriesDataSet(
data[lambda x: x.date <= training_cutoff],
time_idx=..., # column name of time of observation
target=..., # column name of target to predict
group_ids=[...], # column name(s) for timeseries IDs
max_encoder_length=max_encoder_length, # how much history to use
max_prediction_length=max_prediction_length, # how far to predict into future
static_categoricals=[...],
static_reals=[...],
time_varying_known_categoricals=[...],
time_varying_known_reals=[...],
time_varying_unknown_categoricals=[...],
time_varying_unknown_reals=[...],
)
</code>AutoML for Time Series
Libraries such as Auto_TS and AutoTS automate model selection, hyper‑parameter tuning, and validation for time‑series tasks. However, the most effective pipelines still rely on strong feature engineering (e.g., via
tsfresh) combined with GBDT models.
Future directions include better handling of concept drift, prior shift, and covariate shift, as well as research on data augmentation and pre‑training for time‑series data.
GuanYuan Data Tech Team
Practical insights from the GuanYuan Data Tech Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.