Time Series Forecasting Augmentation: Frequency, Decomposition, and Patch Techniques
This article reviews why classic classification augmentations fail for forecasting, introduces the essential data‑label consistency requirement, and systematically categorizes effective time‑series augmentation methods—including frequency‑domain (RobustTAD, FreqMask, FreqMix), decomposition (STAug), and patch‑based approaches (WaveMask, WaveMix, Dominant Shuffle, Temporal Patch Shuffle)—backed by extensive experiments on long‑term, short‑term, and classification tasks.
Data augmentation is indispensable in modern machine learning. In computer vision it is crucial for training good models, and in time‑series classification a mature set of techniques (jittering, scaling, window slicing, time warping, permutation, rotation) exists. However, forecasting differs because the target is a continuous signal that follows the input, making many classification‑oriented augmentations unsuitable.
Why Classification Augmentations Fail in Forecasting
Techniques such as jittering, scaling, window warping, and permutation preserve class labels but disrupt the relationship between the look‑back window x and the prediction horizon y. Altering the input without a corresponding, consistent change to the target breaks the input‑target alignment, causing performance to drop below the non‑augmented baseline.
Data‑Label Consistency: A Necessary Condition
Let the concatenated sequence be s = x \parallel y. Augmentation should be applied to s as a whole, then split back into (\tilde{x}, \tilde{y}) = Split(\mathcal{A}(s)). This preserves the natural continuity between input and target, which is essential for forecasting.
Classification of Forecast Augmentation Methods
Frequency‑based: RobustTAD, FreqMask, FreqMix, WaveMask, WaveMix, Dominant Shuffle
Decomposition‑based: STAug
Other: wDBA, MBB, Upsample
Patch‑based: Temporal Patch Shuffle (TPS)
RobustTAD
RobustTAD performs a discrete Fourier transform on the concatenated sequence, perturbs selected frequency bands (either amplitude or phase), and applies the inverse transform. The perturbation magnitude is controlled by a proportion of the spectrum, and amplitude variants replace magnitudes with samples from a Gaussian distribution, while phase variants add a small offset. Originally designed for anomaly detection, the amplitude variant is also used for multivariate forecasting.
FreqMask and FreqMix
Both start with s = x \parallel y and compute S = rFFT(s). FreqMask applies a binary mask M to zero out selected frequencies: \tilde{S} = M \odot S, \tilde{s}=irFFT(\tilde{S}). FreqMix mixes two sequences in the frequency domain:
\tilde{S}=M \odot S_1 + (1-M) \odot S_2, \tilde{s}=irFFT(\tilde{S}). These operations are simple yet effective, forcing models to be robust to missing frequency components.
WaveMask and WaveMix (Time‑Frequency Localization)
Fourier transforms lose temporal location. Short‑Time Fourier Transform (STFT) uses a fixed window, while wavelets provide multi‑resolution analysis. WaveMask and WaveMix first apply a discrete wavelet transform W = WaveDec(s) = {W^{(1)},…,W^{(L+1)}}. For each level l, WaveMask masks coefficients: \tilde{W}^{(l)} = M^{(l)} \odot W^{(l)}, and WaveMix mixes coefficients from two sequences:
\tilde{W}^{(l)} = M^{(l)} \odot W_1^{(l)} + (1-M^{(l)}) \odot W_2^{(l)}. Reconstructing with inverse DWT yields the augmented sequence.
Dominant Shuffle
Dominant Shuffle selects the top‑ k dominant frequencies Ω_k from the FFT of s, shuffles only those components, and leaves the rest untouched: S_{Ω_k} = Shuffle(S_{Ω_k}), then \tilde{s}=IFFT(\tilde{S}). This avoids overly aggressive perturbations of the whole spectrum. In the TPS benchmark, Dominant Shuffle is not the strongest overall.
STAug (Decomposition‑Based)
STAug applies Empirical Mode Decomposition (EMD) to two sequences, obtaining intrinsic mode functions (IMFs). It then recombines the IMFs using mixup‑style interpolation weights sampled from a uniform distribution, producing a new sequence that blends temporal features. The method suffers from high memory consumption; in the TPS experiments it could not be evaluated on the ECL and Traffic datasets due to GPU memory limits.
Other Non‑Frequency Methods
wDBA aligns sequences with Dynamic Time Warping (DTW) and averages them, producing high‑quality synthetic samples at a large computational cost. MBB decomposes a series into trend, seasonality, and residual via STL, then bootstraps residual blocks. Upsample extracts a continuous segment and linearly interpolates it back to the original length, acting as a local magnifier; it consistently provides a strong non‑frequency baseline.
From Image Patch to Time‑Series Patch
Patch‑based augmentation is well‑established in vision (e.g., PatchShuffle, PatchMix) because images have spatial redundancy. Time series lack such redundancy; shuffling non‑overlapping patches creates hard boundaries and breaks input‑target alignment. Therefore, patch ideas must be re‑thought for the temporal domain.
Temporal Patch Shuffle (TPS)
The TPS pipeline works as follows:
Concatenate the look‑back window and prediction horizon into a continuous sequence s = x \parallel y to enforce data‑label consistency.
Temporal Patching : extract overlapping patches of length p with stride s. Overlap ensures smooth transitions during reconstruction.
Variance Scoring : compute the variance of each patch across all channels (after normalisation). Low‑variance patches contain fewer structural details and are safer to perturb.
Selective Shuffle : shuffle a proportion \alpha of the lowest‑variance patches; the remaining patches stay in place.
Reconstruction : place each patch back (shuffled or not) and average overlapping regions to smooth discontinuities.
Split the reconstructed sequence back into the augmented input \tilde{x} and target \tilde{y}.
Ablation Study
Key findings:
Data‑label consistency is decisive; augmenting only the input while keeping the target unchanged causes the largest performance drop.
Overlapping patches are crucial; replacing them with non‑overlapping patches degrades results noticeably.
Variance‑aware ordering provides a modest gain, especially when only a subset of patches is shuffled.
Operating directly in the time domain outperforms frequency‑domain variants of the same patch operation.
Higher shuffle ratios (≈0.7–1.0) generally yield stronger improvements.
Overall, the study emphasizes that forecasting augmentation must inject *controlled* randomness that respects the signal’s temporal structure.
Long‑Term Forecasting Results
TPS was evaluated on nine long‑term datasets using five recent backbones (TSMixer, DLinear, PatchTST, TiDE, LightTS). Across all backbones, TPS achieved the best average MSE, improving the second‑best baseline by 2.08%–10.51% (the largest 10.51% gain on LightTS).
Short‑Term Traffic Forecasting
Using PatchTST as the backbone on four traffic datasets (PeMS‑03,‑04,‑07,‑08), TPS again delivered the strongest overall enhancement, with MSE improvements ranging from 0.00% to 7.14% and never degrading performance.
Extension to Time‑Series Classification
For classification, TPS removes the concatenation step and shuffles patches at the sample level. On 30 univariate UCR datasets (MiniRocket) and 10 multivariate UEA datasets (MultiRocket), TPS achieved the highest average accuracy, improving the best competitor by 0.50% (UCR) and 1.10% (UEA), and ranking in the top‑2 on a majority of datasets.
Conclusion
TPS’s advantage stems from three factors: it avoids costly decomposition steps, it does not indiscriminately disturb the entire spectrum, and it preserves input‑target alignment through data‑label consistency. By applying controlled, variance‑aware shuffling with overlapping patches and averaging, TPS consistently outperforms other augmentations across long‑term forecasting, short‑term traffic prediction, and time‑series classification, establishing a new state‑of‑the‑art across tasks and model families.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
