Artificial Intelligence 12 min read

Paper Reading: CoRA – A Multimodal Covariate Adaptation Framework for Time‑Series Foundation Models

CoRA freezes pretrained time‑series foundation models, extracts multimodal covariate embeddings, evaluates their causal relevance with a trainable Granger‑Causal Embedding, and injects them via a zero‑initialized condition module, achieving up to 31.1% MSE reduction across single‑ and multi‑modal forecasting tasks.

Bighead's Algorithm Notes

Dec 11, 2025

Paper Reading: CoRA – A Multimodal Covariate Adaptation Framework for Time‑Series Foundation Models

Background

Time‑series forecasting is essential in domains such as weather, supply chain, and finance. Large‑scale pretrained time‑series foundation models (TSFMs) such as TimesFM, Chronos, and Sundial achieve strong zero‑shot generalisation, but most are pretrained on single‑variable series, limiting their ability to incorporate multimodal covariates (additional series, text, images) in real‑world scenarios.

Problem Definition

The paper identifies three challenges: (1) single‑variable pretraining prevents direct use of multivariate or multimodal covariates; (2) covariate‑target dependencies are often domain‑specific, non‑causal, and noisy, requiring a data‑driven quantification of causal contribution; (3) naïve adaptation (e.g., inserting covariate modules) disrupts the pretrained embedding space, causing catastrophic forgetting and unstable training.

Method: CoRA Framework

3.1 Freeze the Base Model as a Feature Extractor

Each modality (time‑series, text, image) is processed by a dedicated pretrained encoder (TSFM, LLM, ViT). Encoders remain frozen to retain learned knowledge. For time‑series covariates the embedding from the last timestep is used; for text and image covariates the timestep‑averaged embeddings are taken. The target variable is encoded by the TSFM backbone’s last‑timestep embedding.

3.2 Covariate Causal Evaluation: Granger‑Causal Embedding (GCE)

A trainable Granger‑Causal Embedding matrix W_GC aligns multimodal covariate embeddings into a unified latent space and quantifies their causal impact on the target based on Granger causality theory. The process consists of (1) aligning embeddings, and (2) concatenating all modality embeddings followed by a Softmax‑weighted aggregation.

3.3 Zero‑Initialisation Condition Injection

A lightweight MLP maps the causally weighted covariate embedding H to a scaling factor α, a shift factor β, and a bias γ. These parameters are injected into the TSFM’s prediction head, allowing covariate information to modulate the forecast. Both the MLP and the alignment parameters are zero‑initialised so that the adapted model starts from the exact pretrained state, preventing catastrophic forgetting.

Experiments

4.1 Datasets and Baselines

Single‑modal covariates: ETT (transformer temperature), Weather, ECL (electric load), EPF (electric price). Multi‑modal covariates: RT‑1 (robot images), Time‑MMD (text). Baselines include adaptation methods (AdaPTS, ChronosX, UniCA), deep forecasting models (TimeXer, iTransformer, PatchTST, N‑BEATSx) and TSFMs (Sundial, TimesFM, Chronos‑Bolt, FlowState).

4.2 Main Results

Single‑modal Covariate Forecasting

On long‑term datasets (ETTh1, ETTh2) CoRA reduces MSE by 31.1 % and MAE by 19.8 % relative to the strongest baseline TimeXer, and improves over UniCA by 18.7 %.

For short‑term EPF forecasting CoRA lowers MSE by 9.4 % relative to TimeXer and by 6.4 % relative to AdaPTS.

Multi‑modal Covariate Forecasting

With image covariates (RT‑1 subset) CoRA cuts MSE by 12.7 % and CRPS by 8.8 % versus the best supervised model.

For text covariates (Time‑MMD) CoRA reduces MSE by 3.0 % and CRPS by 3.7 % compared with UniCA.

Few‑Shot Forecasting (EPF)

When only 1 %–25 % of training samples are available, CoRA achieves 15 %–20 % lower MSE than TimeXer and 5 %–10 % lower MSE than ChronosX, demonstrating rapid adaptation under data scarcity.

Multivariate Forecasting

On multivariate datasets (ETT, Weather) CoRA lowers average MSE by 14.5 % and MAE by 12.2 % versus TimeXer, highlighting its advantage for joint target prediction.

Model Analysis

Generality

CoRA is compatible with various TSFMs (Sundial, TimesFM, Chronos‑Bolt, FlowState) and yields average MSE reductions ranging from 3.3 % to 14.2 %, confirming broad applicability.

Ablation Study

Removing covariates (w/o covariate) increases MSE by 6.5 % → covariates are essential.

Removing the adaptive layer‑norm injection (w/o adaLN) raises MSE by 12.9 % → the condition injection is effective.

Removing GCE (w/o selection) raises MSE by 8.3 % → causal weighting matters.

Removing zero‑initialisation (w/o zero‑init) raises MSE by 4.3 % → stable initialisation prevents forgetting.

Interpretability

The Pearson correlation between GCE scores and traditional Granger‑Geweke causality tests is 0.58, indicating that GCE reliably quantifies covariate causal contributions.