Paper Reading: Multi‑Cycle Learning Framework (MLF) for Financial Time‑Series Forecasting
The paper introduces MLF, a multi‑cycle learning framework that integrates three novel modules—inter‑cycle redundancy filtering (IRF), learnable weighted integration (LWI), and multi‑cycle adaptive patch (MAP)—plus a patch‑squeeze component, achieving higher accuracy and efficiency on financial time‑series tasks such as fund‑sales prediction and outperforming strong single‑ and multi‑cycle baselines, with successful deployment in Alipay’s fund inventory system.
Background
Time‑series forecasting (TSF) is critical in finance; for example, accurate fund‑sales forecasts on Alipay help ensure product supply and support risk management. Financial series are influenced by short‑term public sentiment and longer‑term policy and market trends, requiring multi‑cycle inputs for reliable predictions. Existing TSF models typically use a single cycle or lack dedicated designs for multi‑cycle features.
Problem Definition
Given a multivariate historical series X_{h}=[x_{1},…,x_{n}], the goal is to predict the future m steps X_{f}=[x_{n+1},…,x_{n+m}] for all c variables. Multi‑cycle input x_{h}^{*}=[X_{h}^{1},…,X_{h}^{s}] consists of s periods of varying lengths n_{s}, where larger s denotes longer cycles.
Method
The proposed MLF framework consists of four key components:
3.1 Multi‑Cycle Adaptive Patch (MAP)
Each period is split into patches. Standard approaches yield different numbers of patches per period, causing imbalance. MAP adaptively adjusts patch length L^{s} and stride K^{s} so that every period contains the same number of patches N. With a fixed α=2, the relationship between L^{s}, K^{s}, and N is derived (see Figure 1).
3.2 Patch‑Squeeze Module
Self‑supervised studies show redundancy within a cycle; a small subset of patches can capture the global pattern. The patch‑squeeze module uses a lightweight encoder ( PatchEnc, a linear layer) and a decoder (two MLPs) to reconstruct patches ( \hat{x}_{p}^{s}) and produce compressed embeddings ( \hat{x}_{d}). An squeeze factor r reduces the original patch count N^{s} to N^{s}/r.
3.3 Multi‑Cycle Self‑Attention
Compressed patches \hat{x}_{d} are fed into a Transformer encoder, generating query, key, and value matrices for each head. Scaled dot‑product attention (with scaling factor \sqrt{d_{k}}) produces attention outputs, followed by batch normalization and a residual feed‑forward network.
3.4 Inter‑Cycle Redundancy Filtering (IRF)
Redundant information across cycles can cause the self‑attention to over‑focus on repeated parts. IRF splits the encoder output z_{e} into per‑cycle representations z_{e}^{s}, then applies a single‑cycle processing (SPP) module with two linear branches: one predicts the current cycle, the other estimates redundant information from longer cycles ( s' > s). Redundancy is subtracted from each z_{e}^{s}, yielding filtered representations.
3.5 Learnable Weighted Integration (LWI)
After obtaining predictions for each cycle, LWI aggregates them using a lightweight CNN that extracts temporal features from the longest period. Two MLPs generate query and key vectors; a sigmoid‑activated attention computes per‑cycle weights, which are used to produce a weighted average of the multi‑cycle forecasts.
Experiments
4.1 Experimental Setup
Datasets : A proprietary fund‑sales dataset from Alipay (2015‑01 to 2023‑01, split 7:1:2) and public benchmarks (ETTh1/h2, ETTm1/m2, Weather, Exchange, Illness, Electricity). Metrics : MSE, MAE, and WMAPE (the latter emphasizes large transaction values in the fund dataset). Baselines : Single‑cycle models (PatchTST, NHits, Scaleformer, PathFormer) and multi‑cycle models (FiLM, Patch‑Concat, Patch‑Ensemble). Implementation : Short‑term tasks use cycle lengths {5,10,30,60,120,150} with prediction horizons {1,5,8,10}; long‑term tasks use {128,256,512,768,1024,2048} with horizons {96,192,336,720}. Each period uses 64 patches; squeeze factors r ∈ {2,4,8}.
4.2 Forecasting Accuracy
MLF outperforms all single‑cycle baselines on both short‑ and long‑term tasks, demonstrating the benefit of leveraging multiple cycles. Compared with advanced multi‑cycle methods (FiLM, Patch‑Concat, Patch‑Ensemble), MLF achieves higher accuracy, confirming that its custom multi‑cycle design better exploits inter‑cycle information.
4.3 Efficiency
Using squeeze factors 8 (MLF‑8) and 4 (MLF‑4), MLF attains faster inference despite longer input sequences. On the fund dataset, MLF is 3.6× faster than Scaleformer and 11.8× faster than PathFormer; efficiency gains are even larger on bigger public datasets while maintaining competitive accuracy.
4.4 Deployment Results
MLF was deployed in Alipay’s Fund Inventory Management System (FIMS) at the end of February 2024. Over five consecutive weeks, it achieved significant improvements in WMAPE and total GMV compared with the previously deployed PatchTST model (exact improvement rates shown in Figures 6‑7).
4.5 Ablation Study
Removing IRF increases MSE, and removing both MA and IRF further degrades performance, confirming that filtering inter‑cycle redundancy helps attention focus. Excluding LWI or MA also raises MSE, highlighting their importance for weighting cycles and capturing dependencies. Omitting MAP leads to higher MSE, proving that equal‑patch allocation across cycles is crucial. The patch‑squeeze module remains effective across squeeze factors 2, 4, 8; removing its reconstruction loss harms accuracy.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
