Artificial Intelligence 17 min read

A Survey of Time Series Forecasting Augmentation: Frequency Domain, Decomposition, and Patch Methods

The article reviews why classic classification augmentations fail for forecasting, outlines a taxonomy of effective time‑series augmentation techniques—including frequency‑domain, decomposition, and patch‑based methods—details the Temporal Patch Shuffle (TPS) pipeline, and presents extensive experiments showing TPS achieves state‑of‑the‑art improvements across long‑term, short‑term, and classification tasks.

DeepHub IMBA

Apr 22, 2026

A Survey of Time Series Forecasting Augmentation: Frequency Domain, Decomposition, and Patch Methods

Why Classification‑Oriented Augmentation Fails in Forecasting

Techniques such as jittering, scaling, window warping, permutation, and rotation were designed for classification where the label is discrete and unchanged; applying them to forecasting disrupts the input‑target continuity, breaking the look‑back window and prediction horizon relationship and causing performance drops.

Data‑Label Consistency: A Necessary Condition

For a look‑back window x and target y, the training object is the concatenated sequence s = x ∥ y. Augmentation must be applied to s before splitting, ensuring the augmented input \tilde{x} and target \tilde{y} remain temporally aligned.

Taxonomy of Forecast Augmentation Methods

Frequency‑based: RobustTAD, FreqMask, FreqMix, WaveMask, WaveMix, Dominant Shuffle

Decomposition‑based: STAug

Other: wDBA, MBB, Upsample

Patch‑based: TPS

RobustTAD

Applies discrete Fourier transform to the concatenated sequence, perturbs selected frequency bands (amplitude or phase) with a Gaussian‑controlled intensity, then inverse‑transforms back to the time domain.

FreqMask and FreqMix

Both start with s = x ∥ y and compute its real FFT S = rFFT(s). FreqMask zeros out selected frequencies using a binary mask M ( S̃ = M ⊙ S), while FreqMix blends spectra from two sequences: S̃ = M ⊙ S₁ + (1−M) ⊙ S₂. The inverse FFT yields the augmented signal.

WaveMask and WaveMix

Use discrete wavelet transform (DWT) to decompose s into multi‑level coefficients W^{(l)}. WaveMask applies a mask per level ( \tilde{W}^{(l)} = M^{(l)} ⊙ W^{(l)}), while WaveMix mixes coefficients from two sequences (

\tilde{W}^{(l)} = M^{(l)} ⊙ W₁^{(l)} + (1−M^{(l)}) ⊙ W₂^{(l)}

). Inverse DWT reconstructs the augmented series.

Dominant Shuffle

Selects the top‑k dominant frequencies from the FFT of s and shuffles only those components before inverse transforming, avoiding wholesale spectral distortion.

STAug

Applies Empirical Mode Decomposition (EMD) to two sequences, obtains intrinsic mode functions (IMFs), then recombines them using mixup‑style interpolation weights sampled from a uniform distribution. High memory consumption limits its scalability.

Other Non‑Frequency Methods

wDBA

aligns series via DTW and averages them; MBB decomposes series with STL, bootstraps residual blocks; Upsample linearly stretches a contiguous segment back to original length, providing a strong non‑frequency baseline.

From Image Patches to Temporal Patches

Patch‑based augmentation works in vision because spatial redundancy tolerates local shuffling. In time series, naive non‑overlapping patches create hard boundaries and break input‑target alignment, so patch operations must be re‑designed for the temporal domain.

Temporal Patch Shuffle (TPS)

The TPS pipeline concatenates look‑back and horizon, extracts overlapping patches (length p, stride s), scores each patch by variance, selects the lowest‑variance fraction α for random shuffling, then reconstructs the series by averaging overlapping regions and finally splits back into augmented input and target.

Algorithm Details

Concatenate look‑back window and horizon to enforce data‑label consistency.

Extract overlapping patches using length p and stride s.

Compute variance of each patch (across channels) in normalized space; low variance patches are deemed safe to shuffle.

Randomly permute the selected α proportion of patches while leaving others unchanged.

Reconstruct the series by placing each patch back (averaging overlaps) to smooth discontinuities.

Split the reconstructed series back into augmented input and target.

Ablation Findings

Joint augmentation of input and target is decisive; augmenting only the input causes the largest performance drop.

Overlapping patches are crucial; non‑overlapping patches degrade results noticeably.

Variance‑based ranking yields modest gains; its benefit disappears when all patches are shuffled ( α = 1.0).

Operating directly in the time domain outperforms FFT‑based variants.

Higher shuffle ratios (0.7–1.0) consistently deliver stronger performance.

Long‑Term Forecasting Results

Evaluated on nine long‑term datasets with five backbones (TSMixer, DLinear, PatchTST, TiDE, LightTS). TPS achieved the best average MSE on every backbone, improving the strongest competitor by 2.08%–10.51% (10.51% on LightTS).

Short‑Term Traffic Forecasting

On four PeMS traffic datasets (03, 04, 07, 08) using PatchTST, TPS again delivered the strongest augmentation gains, with MSE improvements ranging from 0.00% to 7.14%.

Extension to Time‑Series Classification

For classification, TPS skips concatenation and shuffles at the sample level. It achieves the highest average accuracy among compared augmentations on 30 univariate UCR datasets (≈+0.50% accuracy) and 10 multivariate UEA datasets (≈+1.10%).

Conclusion

TPS’s uniqueness stems from avoiding costly decomposition, refraining from indiscriminate spectral perturbation, and preserving input‑target alignment. By applying controlled randomness—overlapping patches, variance‑aware shuffling, and strict data‑label consistency—it consistently improves forecasting and classification across diverse models and datasets, achieving SOTA‑level augmentation performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning data augmentation forecasting time series frequency domain temporal patch shuffle wavelet

Written by

DeepHub IMBA

A must‑follow public account sharing practical AI insights. Follow now. internet + machine learning + big data + architecture = IMBA

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.