Time‑o1: Overcoming Time‑Series Forecasting Bottlenecks with a Novel Loss Function

The paper identifies two fundamental issues in time‑series forecasting—label autocorrelation bias and task‑scale explosion caused by the standard TMSE loss—and proposes Time‑o1, a PCA‑based orthogonal label transformation that eliminates bias, reduces optimization complexity, and yields consistent performance gains across multiple models and datasets.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Time‑o1: Overcoming Time‑Series Forecasting Bottlenecks with a Novel Loss Function

Problem Analysis

In modern time‑series forecasting, research has focused on designing ever more complex network architectures such as Transformers and linear models, while the loss function remains almost universally the time‑domain mean squared error (TMSE). This oversight creates two critical limitations (NeurIPS 2025):

1.1 Challenge 1 – Label Autocorrelation Bias

Time‑series labels are highly autocorrelated, but TMSE treats each prediction step as independent, leading to a biased loss. The authors formalize this bias in Theorem 1 (Autocorrelation Bias) , showing that the TMSE‑negative log‑likelihood gap vanishes only when the label steps are uncorrelated.

1.2 Challenge 2 – Task‑Scale Explosion

TMSE models each forecast step as a separate task, so the total number of tasks grows linearly with the prediction horizon T. When T is large, gradient conflicts among tasks hinder convergence, especially in long‑horizon scenarios such as manufacturing scheduling or traffic flow prediction.

Time‑o1: Loss Design in a Transformed Domain

2.1 Core Idea

Time‑o1 applies principal component analysis (PCA) to the label sequence, converting it into orthogonal principal components ordered by variance. By aligning model predictions with the most informative components, the method simultaneously removes label autocorrelation (solving Challenge 1) and reduces the effective number of optimization tasks (solving Challenge 2) while preserving the efficiency of direct‑forecast (DF) pipelines.

2.2 Implementation Flow

Standardize the label sequence to ensure PCA operates on a zero‑mean, unit‑variance series.

Compute the projection matrix via singular value decomposition (SVD) and retain the top‑k right singular vectors, forming the optimal projection.

Spatial transformation – project both ground‑truth and predicted sequences into the principal‑component space.

Compute the transformed‑domain loss as the MSE between projected predictions and projected labels.

Target fusion – blend the transformed‑domain loss with the original‑space MSE using a weighted sum to balance information from both domains.

The entire projection can be obtained in a single SVD, making the approach computationally cheap and model‑agnostic.

2.3 Theoretical Guarantees

The authors prove that PCA yields mutually orthogonal components, eliminating autocorrelation bias, and that component variances decay monotonically. Consequently, retaining only the top K components reduces the number of effective tasks, directly addressing Challenge 2.

Empirical Evaluation

Experiments on benchmark datasets (e.g., ETTh1) show that Time‑o1 reduces FredFormer’s MSE by 0.016 and yields similar improvements on other models (iTransformer, FreTS, DLinear). Visualizations demonstrate that DF models trained with TMSE capture general trends but fail on high‑variance peaks, whereas Time‑o1 accurately fits these spikes.

Comparisons with alternative alignment losses (Dilate, Soft‑DTW, DPTA) reveal only marginal gains for those methods because they neither decorrelate labels nor reduce task count. Ablation studies confirm that both label orthogonalization and task‑reduction contribute positively, with their combination achieving the best results.

Beyond PCA, the framework supports other statistical transforms (SVD, RPCA, FA); all improve over the baseline DF, but PCA consistently delivers the highest performance due to its simultaneous decorrelation and dimensionality reduction.

Conclusion

Time‑o1 introduces a label‑side feature‑engineering perspective to time‑series forecasting by redefining the loss in an orthogonal transformed space. This resolves the two longstanding challenges of autocorrelation bias and task‑scale explosion, delivering performance gains comparable to or exceeding those from architectural innovations, and works across a wide range of forecasting models.

Label autocorrelation vs. PCA orthogonalization
Label autocorrelation vs. PCA orthogonalization
Variance distribution of original labels vs. principal components
Variance distribution of original labels vs. principal components
Prediction visualisation: DF (TMSE) vs. Time‑o1
Prediction visualisation: DF (TMSE) vs. Time‑o1
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

time series forecastingPCAloss functionmultitask learningNeurIPS 2025Time‑o1
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.