How Shapelet-Based Patterns Predict Financial Market Direction
The article presents a two‑stage framework—SIMPC for invariant multivariate pattern clustering and JISC‑Net for shape‑subclass detection—that achieves accurate and interpretable financial market direction forecasts, outperforming strong baselines on Bitcoin and S&P 500 datasets across most metric‑dataset combinations.
Background
Financial market direction forecasting must balance accuracy and interpretability. Three major challenges are identified: (1) high noise and weak signals in volatile price series, (2) pattern scale variations where repeated motifs undergo amplitude scaling and time warping, and (3) deep‑learning models’ black‑box nature that hinders explanation of predictions.
Problem Definition
The goal is to unsupervisedly discover repeatable patterns that are invariant to scaling and warping in noisy multivariate time series, and to use those patterns to predict short‑term price movement (up or down). The solution consists of two stages:
Pattern extraction: identify invariant pattern clusters.
Pattern detection: classify the initial segment of a pattern to forecast the subsequent direction.
Method
SIMPC – Selective Invariant Multivariate Pattern Clustering
SIMPC extracts robust repeatable patterns from multivariate series, addressing limitations of earlier single‑variable methods (e.g., SISC) such as random initialization and noise sensitivity.
Kernel regression preprocessing : Nadaraya‑Watson kernel regression smooths the input series X using a Gaussian kernel K with bandwidth h.
Domain‑adaptive initialization : Classic chart patterns (head‑and‑shoulders, double bottom, etc.) are used as initial centroids. Historical data segments matching these rules are normalized and averaged with DTW‑based DBA to form multivariate prototypes T_i.
Improved K‑means++ initialization : m domain prototypes are kept as fixed seeds; the remaining P‑m centroids are sampled proportionally to the minimum DTW distance between candidate segments and existing centroids.
Iterative clustering and centroid update : A sliding window extracts variable‑length subsequences, assigns them to the nearest centroid by DTW distance, discards small clusters ( kappa members), updates remaining centroids with DBA, and merges similar centroids (DTW distance ≤ delta) to obtain the final pattern set C={C_1,…,C_{P'}}.
JISC‑Net – Joint Invariant Shape Sub‑Classification Network
JISC‑Net leverages shape‑subclassification, DTW, and a multi‑length dilated causal CNN (Mdc‑CNN) to achieve robustness to time warping.
Mdc‑CNN encoder and DTW triplet loss : Only the initial gamma portion (0–1) of a pattern subsequence is fed to the encoder f_{\theta}, which maps it to an embedding z. Training samples are generated by sliding windows with scales alpha (e.g., 0.2, 0.4, 0.6). A triplet loss based on DTW distance separates anchor A, positive S^+, and negative S^- embeddings. Numerical safeguards include epsilon=10^{-6} and soft margin m=0.2.
Shape‑sub discovery : Embeddings are clustered with Euclidean K‑means; high‑purity candidates (label consistency > 1/|P'|) are retained and scored by utility.
SVM classifier : The minimum DTW distance between an input subsequence and each discovered shape‑sub serves as a feature for a linear SVM that classifies pattern types.
Two‑stage filtering : During training, a Kolmogorov‑Smirnov test removes statistically insignificant pattern labels (p‑value > 0.05). At inference, only predictions with confidence in the top x% are kept, discarding low‑confidence samples.
Experiments
Experimental Setup
Datasets: Bitcoin (BTC/USD) and S&P 500 stocks (AAPL, BRK.B, XOM) with features including closing price, volume, and RSI.
Data splits: training (BTC 2014‑2021; stocks 2008‑2018), validation (BTC 2022‑2023; stocks 2019‑2021), test (BTC 2024‑2025; stocks 2021‑2025).
Hyper‑parameters: pattern length L_{min}=18, L_{max}=22; SIMPC extracts P=8 patterns (6 classic prototypes); JISC‑Net uses initial segment gamma=0.8; number of shape‑subs g=10.
Results
Impact of multivariate input : The DTW distance of multivariate (CVR) pattern centroids (average 1.743, min 0.898) is significantly higher than that of univariate patterns (average 1.642, min 0.217), indicating better discriminability.
Baseline comparison : Across 12 indicator‑dataset combinations, the unfiltered model (Ours‑T@100) achieves the best rank in 8 cases and second best in 3. It yields the highest average return (AR) on all three assets and maintains positive total return (TRwf), outperforming LightGBM, Transformer, TimesNet and other baselines.
Two‑stage filtering effect : The KS test removes 5.6%–41.5% of low‑discriminability labels. Confidence filtering (e.g., T@80) balances return and trade count; for XOM, AR reaches 0.889 and TRwf 1.219.
Interpretability verification : Case studies show successful predictions align closely with extracted patterns (e.g., price drops accompanied by volume spikes). Failures often stem from over‑fitting of the initial segment or DTW mis‑alignment, suggesting improvements via probabilistic prediction and shape‑aware alignment.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
