Artificial Intelligence 23 min read

Recent Advances in Multivariate Time Series Forecasting: Paper Summaries (Sep 27 – Oct 10 2025)

This article summarizes eight newly released AI papers on multivariate time‑series forecasting and anomaly detection, detailing each work's motivation, proposed methodology, key innovations such as CRIB, TS‑JEPA, DSAT‑HD, DIMIGNN, ASTGI, IndexNet, TsLLM, Moon, TimeSeriesScientist, MLG‑4TS, and Augur, and reports their experimental validation on real‑world datasets.

Bighead's Algorithm Notes

Oct 11, 2025

Recent Advances in Multivariate Time Series Forecasting: Paper Summaries (Sep 27 – Oct 10 2025)

Revisiting Multivariate Time Series Forecasting with Missing Values

Paper link: http://arxiv.org/pdf/2509.23494v1

Code link: https://github.com/Muyiiiii/CRIB

Missing values are pervasive in real‑world multivariate time‑series data. Existing pipelines first impute missing entries and then forecast, but unsupervised imputation can distort the underlying distribution and degrade prediction accuracy. The authors conduct systematic empirical studies that confirm this degradation. To avoid imputation, they propose a Consistency‑Regularized Information Bottleneck (CRIB) that directly predicts from partially observed series. CRIB combines a univariate attention mechanism with a consistency regularizer to learn representations that filter noise from missing entries while preserving predictive signals. Experiments on four real‑world datasets show that CRIB maintains high forecasting accuracy even at large missing rates.

Joint Embeddings Go Temporal

Paper link: http://arxiv.org/pdf/2509.25449v1

Time‑Series JEPA (TS‑JEPA) adapts the Joint Embedding Predictive Architecture (JEPA) for time‑series representation learning. Instead of reconstructing masked inputs, TS‑JEPA learns latent‑space embeddings through a self‑supervised objective. The authors evaluate TS‑JEPA on classification and forecasting benchmarks and report performance that matches or exceeds current state‑of‑the‑art baselines, demonstrating a balanced trade‑off across multiple downstream tasks.

DSAT‑HD: Dual‑Stream Adaptive Transformer with Hybrid Decomposition for Multivariate Time Series Forecasting

Paper link: http://arxiv.org/pdf/2509.24800v1

Existing Transformers for time‑series forecasting model only a limited number of series or operate at a fixed scale, which hampers capture of diverse temporal patterns. DSAT‑HD introduces three innovations: (1) a hybrid decomposition that combines EMA, Fourier decomposition, and RevIN normalization, followed by a Top‑k noise gate to balance seasonal and trend components; (2) a multi‑scale adaptive path routing that routes features to four parallel Transformer layers and merges them with a sparse combiner, enabling local CNN‑style attention and global Transformer interactions; (3) a dual‑stream residual learning framework where a CNN branch processes seasonal parts and an MLP branch processes trend parts, coordinated by a variance‑balancing loss. Extensive experiments on nine datasets show DSAT‑HD outperforms prior methods and achieves state‑of‑the‑art results on several sets, with strong transfer‑learning generalization.

Graph Neural Networks with Diversity‑aware Neighbor Selection and Dynamic Multi‑scale Fusion for Multivariate Time Series Forecasting

Paper link: http://arxiv.org/pdf/2509.23671v1

Typical GNN‑based MTS models aggregate neighbor information without considering diversity, leading to redundant representations, and they rely on a single temporal scale. The proposed DIMIGNN adds (1) a Diversity‑aware Neighbor Selection Mechanism (DNSM) that selects neighbors with high information similarity while preserving structural diversity, and (2) a Dynamic Multi‑scale Fusion Module (DMFM) that adaptively weights predictions from multiple temporal scales. Experiments on several real‑world datasets demonstrate consistent gains over previous GNN approaches.

ASTGI: Adaptive Spatio‑Temporal Graph Interactions for Irregular Multivariate Time Series Forecasting

Paper link: http://arxiv.org/pdf/2509.23313v1

Irregular multivariate time series (IMTS) present two challenges: (1) representing asynchronous observations without distortion, and (2) capturing complex dynamic dependencies. ASTGI first encodes each raw observation as a learnable spatio‑temporal point. It then constructs adaptive causal graphs for each point via nearest‑neighbor search. A time‑aware attention module propagates messages on these graphs, weighting interactions by relative spatio‑temporal distance. Finally, a query‑point aggregation module predicts by regressing on the neighborhood of each query point. Benchmarks on multiple datasets show ASTGI surpasses a range of state‑of‑the‑art baselines.

IndexNet: Timestamp and Variable‑Aware Modeling for Time Series Forecasting

Paper link: http://arxiv.org/pdf/2509.23813v2

IndexNet augments a lightweight MLP backbone with an Index Embedding (IE) module composed of Timestamp Embedding (TE) and Channel Embedding (CE). TE converts raw timestamps into vectors that are injected into the input sequence, improving capture of long‑term periodic patterns. CE assigns each variable a distinct trainable identity, allowing the model to differentiate heterogeneous variables even when their raw values are similar. Experiments on twelve real‑world datasets show IndexNet achieves performance comparable to mainstream baselines while offering plug‑and‑play flexibility and improved interpretability.

Augmenting LLMs for General Time Series Understanding and Prediction

Paper link: http://arxiv.org/pdf/2510.01111v1

TsLLM extends a large language model with a patch‑based encoder‑decoder that provides explicit time‑series perception. The model is pre‑trained on over two million paired time‑series and textual examples. It supports (1) context‑aware forecasting, (2) time‑series question answering, (3) pattern explanation, (4) classification with natural‑language output, and (5) report generation. Although not optimized for traditional numeric benchmarks, TsLLM excels on tasks that require joint numerical reasoning and natural‑language interaction, establishing a new paradigm that bridges numeric computation and language understanding.

Moon: A Modality Conversion‑based Efficient Multivariate Time Series Anomaly Detection

Paper link: http://arxiv.org/pdf/2510.01970v1

Moon addresses three limitations of existing MTS anomaly detectors: reliance on error thresholds in unsupervised methods, under‑estimation of anomalies in semi‑supervised approaches, and high computational cost of supervised classifiers. It introduces a Multivariate Markov Transfer Field (MV‑MTF) that converts numeric series into images, preserving inter‑variable and temporal relationships. A multimodal CNN with shared‑parameter feature fusion combines the original numeric modality with the image modality, improving training efficiency. A SHAP‑based explainer identifies the variables that contribute most to each anomaly. Experiments on six real‑world datasets report a 93 % reduction in inference time, a 4 % increase in detection accuracy, and a 10.8 % improvement in explanation quality compared with six state‑of‑the‑art methods.

TimeSeriesScientist: A General‑Purpose AI Agent for Time Series Analysis

Paper link: http://arxiv.org/pdf/2510.01538v2

TimeSeriesScientist (TSci) is an LLM‑driven agent framework comprising four specialized agents: Curator (LLM‑guided diagnostics with external statistical tools for targeted preprocessing), Planner (self‑planning to narrow the model‑selection hypothesis space), Forecaster (model fitting, validation, and adaptive ensemble selection), and Reporter (generation of transparent natural‑language reports). Empirical evaluation on eight benchmark datasets shows TSci reduces forecast error by an average of 10.4 % relative to statistical baselines and by 38.2 % relative to LLM‑based baselines, while producing detailed, reproducible reports.

MLLM4TS: Leveraging Vision and Multimodal Language Models for General Time‑Series Analysis

Paper link: http://arxiv.org/pdf/2510.07513v1

Each channel of a multivariate series is rendered as a colored line in a stacked image, capturing inter‑channel spatial dependencies. A time‑aware visual patch alignment aligns visual patches with their corresponding temporal segments. The multimodal LLM fuses fine‑grained numeric details with global visual context, enabling strong performance on prediction, classification, anomaly detection, and forecasting tasks across standard benchmarks. The results highlight the benefit of integrating visual modality with pretrained language models for robust, generalizable time‑series analysis.

Augur: Modeling Covariate Causal Associations in Time Series via Large Language Models

Paper link: http://arxiv.org/pdf/2510.07858v1

Augur introduces a fully LLM‑driven forecasting pipeline that discovers directed causal graphs among covariates. A teacher LLM performs heuristic search and pairwise causal tests to infer a causal graph; a lightweight student agent refines the graph and selects high‑confidence edges. These causal edges are encoded as rich textual prompts that guide prediction, improving accuracy and providing transparent, traceable reasoning. Experiments on real‑world datasets against 25 baselines show competitive performance and robust zero‑shot generalization.

MoGU: Mixture‑of‑Gaussians with Uncertainty‑based Gating for Time Series Forecasting

Paper link: http://arxiv.org/pdf/2510.07459v1

Code link: https://github.com/yolish/moe_unc_tsf

MoGU extends the Mixture‑of‑Experts paradigm for regression by modeling each expert’s output as a Gaussian distribution, yielding both a mean prediction and a variance‑based uncertainty estimate. The gating mechanism selects experts based on their predicted variance rather than raw input features. Benchmarks on multiple time‑series forecasting datasets show MoGU consistently outperforms single‑expert models and traditional MoE setups, while delivering well‑calibrated uncertainties that correlate with prediction error.