Paper Reading: CoRA – A Multimodal Covariate Adaptation Framework for Time‑Series Foundation Models

CoRA freezes pretrained time‑series foundation models, extracts multimodal covariate embeddings, evaluates their causal relevance with a trainable Granger‑Causal Embedding, and injects them via a zero‑initialized condition module, achieving up to 31.1% MSE reduction across single‑ and multi‑modal forecasting tasks.

Bighead's Algorithm Notes
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Paper Reading: CoRA – A Multimodal Covariate Adaptation Framework for Time‑Series Foundation Models

Background

Time‑series forecasting is essential in domains such as weather, supply chain, and finance. Large‑scale pretrained time‑series foundation models (TSFMs) such as TimesFM, Chronos, and Sundial achieve strong zero‑shot generalisation, but most are pretrained on single‑variable series, limiting their ability to incorporate multimodal covariates (additional series, text, images) in real‑world scenarios.

Problem Definition

The paper identifies three challenges: (1) single‑variable pretraining prevents direct use of multivariate or multimodal covariates; (2) covariate‑target dependencies are often domain‑specific, non‑causal, and noisy, requiring a data‑driven quantification of causal contribution; (3) naïve adaptation (e.g., inserting covariate modules) disrupts the pretrained embedding space, causing catastrophic forgetting and unstable training.

Method: CoRA Framework

3.1 Freeze the Base Model as a Feature Extractor

Each modality (time‑series, text, image) is processed by a dedicated pretrained encoder (TSFM, LLM, ViT). Encoders remain frozen to retain learned knowledge. For time‑series covariates the embedding from the last timestep is used; for text and image covariates the timestep‑averaged embeddings are taken. The target variable is encoded by the TSFM backbone’s last‑timestep embedding.

CoRA architecture
CoRA architecture

3.2 Covariate Causal Evaluation: Granger‑Causal Embedding (GCE)

A trainable Granger‑Causal Embedding matrix W_GC aligns multimodal covariate embeddings into a unified latent space and quantifies their causal impact on the target based on Granger causality theory. The process consists of (1) aligning embeddings, and (2) concatenating all modality embeddings followed by a Softmax‑weighted aggregation.

GCE alignment
GCE alignment
GCE aggregation
GCE aggregation

3.3 Zero‑Initialisation Condition Injection

A lightweight MLP maps the causally weighted covariate embedding H to a scaling factor α, a shift factor β, and a bias γ. These parameters are injected into the TSFM’s prediction head, allowing covariate information to modulate the forecast. Both the MLP and the alignment parameters are zero‑initialised so that the adapted model starts from the exact pretrained state, preventing catastrophic forgetting.

Zero‑init condition injection
Zero‑init condition injection

Experiments

4.1 Datasets and Baselines

Single‑modal covariates: ETT (transformer temperature), Weather, ECL (electric load), EPF (electric price). Multi‑modal covariates: RT‑1 (robot images), Time‑MMD (text). Baselines include adaptation methods (AdaPTS, ChronosX, UniCA), deep forecasting models (TimeXer, iTransformer, PatchTST, N‑BEATSx) and TSFMs (Sundial, TimesFM, Chronos‑Bolt, FlowState).

4.2 Main Results

Single‑modal Covariate Forecasting

On long‑term datasets (ETTh1, ETTh2) CoRA reduces MSE by 31.1 % and MAE by 19.8 % relative to the strongest baseline TimeXer, and improves over UniCA by 18.7 %.

Long‑term results
Long‑term results

For short‑term EPF forecasting CoRA lowers MSE by 9.4 % relative to TimeXer and by 6.4 % relative to AdaPTS.

Short‑term results
Short‑term results

Multi‑modal Covariate Forecasting

With image covariates (RT‑1 subset) CoRA cuts MSE by 12.7 % and CRPS by 8.8 % versus the best supervised model.

Image covariate results
Image covariate results

For text covariates (Time‑MMD) CoRA reduces MSE by 3.0 % and CRPS by 3.7 % compared with UniCA.

Text covariate results
Text covariate results

Few‑Shot Forecasting (EPF)

When only 1 %–25 % of training samples are available, CoRA achieves 15 %–20 % lower MSE than TimeXer and 5 %–10 % lower MSE than ChronosX, demonstrating rapid adaptation under data scarcity.

Few‑shot results
Few‑shot results

Multivariate Forecasting

On multivariate datasets (ETT, Weather) CoRA lowers average MSE by 14.5 % and MAE by 12.2 % versus TimeXer, highlighting its advantage for joint target prediction.

Multivariate results
Multivariate results

Model Analysis

Generality

CoRA is compatible with various TSFMs (Sundial, TimesFM, Chronos‑Bolt, FlowState) and yields average MSE reductions ranging from 3.3 % to 14.2 %, confirming broad applicability.

Generality results
Generality results

Ablation Study

Removing covariates (w/o covariate) increases MSE by 6.5 % → covariates are essential.

Removing the adaptive layer‑norm injection (w/o adaLN) raises MSE by 12.9 % → the condition injection is effective.

Removing GCE (w/o selection) raises MSE by 8.3 % → causal weighting matters.

Removing zero‑initialisation (w/o zero‑init) raises MSE by 4.3 % → stable initialisation prevents forgetting.

Ablation results
Ablation results

Interpretability

The Pearson correlation between GCE scores and traditional Granger‑Geweke causality tests is 0.58, indicating that GCE reliably quantifies covariate causal contributions.

Interpretability correlation
Interpretability correlation
time series forecastingfoundation modelsforecasting benchmarksGranger causal embeddingmultimodal covariateszero‑init adaptation
Bighead's Algorithm Notes
Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.