One-Embedding-Fits-All: Selecting the Best Time-Series Forecasting Model from a Model Zoo
The paper introduces ZooCast, a framework that builds a model zoo of time‑series foundation models and uses a One‑Embedding‑Fits‑All paradigm to embed models and tasks into a unified space, enabling efficient zero‑shot selection that outperforms single models and full‑model ensembles on the GIFT‑Eval benchmark while remaining computationally lightweight.
Background
Time‑series forecasting is essential in finance, weather, and industry. Recent Time‑Series Foundation Models (TSFMs) improve zero‑shot prediction, but each model excels on different temporal patterns (e.g., Chronos on high‑frequency power data, VisionTS on spiky cloud data). Directly enumerating or ensembling all models is computationally prohibitive, motivating an efficient way to exploit the complementary strengths of a model zoo.
Problem Definition
The goal is to select the optimal model from a zoo in zero‑shot scenarios while keeping computation low. Challenges are: (1) characterizing model advantages without exhaustive evaluation; (2) aligning heterogeneous models and tasks into a common embedding space despite non‑stationarity and multi‑channel data; (3) robustly ranking models when similarity signals are noisy.
Method (ZooCast Framework)
Advantage Subset Characterization
From each TSFM’s pre‑training data a small subset D of n sub‑sequences is randomly sampled. For each x_i in D the mean‑squared‑error matrix E across all models is computed. Per‑sample error variance σ_i and relative advantage scores s_{m,i} are used with an adaptive threshold τ to filter models into an advantage subset.
Model‑Task Joint Embedding
A joint encoder ψ is trained via multi‑objective optimization: reconstruction loss L_{Reconstruction} (temporal fidelity), contrastive mask‑view loss L_{Constraint} (robustness), and transfer loss L_{Transfer}=1‑MSE (cross‑task similarity). Model representations r_m are obtained by averaging ψ -embeddings of the advantage subset D for each model φ_m, forming the model library R_{zoo}. Task representations μ are built by sampling T -length segments from each channel of a multi‑channel series X, encoding them with ψ, averaging per channel to get μ_c, and stacking into a task matrix μ.
Error‑Correcting Consensus Ranking
Weighted cosine similarity between model and channel embeddings is computed, where model weight w_m reflects the size of its advantage subset. For each channel the top‑ r models form a binary matrix B. The Hamming distance h_m between B and an ideal consensus yields the final ranking r_{final}.
Experiments
Experimental Setup
Evaluation uses the GIFT‑Eval zero‑shot benchmark (23 real datasets, 97 configurations covering energy, finance, healthcare, etc.). Metrics are symmetric mean absolute percentage error ( sMAPE , lower is better) and average Rank (lower is better). Baselines include individual TSFMs (Chronos, Moirai, VisionTS, etc.), full‑model ensemble (All‑13), LogME‑based selection, random selection, and latest‑model selection.
Zero‑Shot Prediction Performance
ZooCast’s Top‑3 ensemble achieves sMAPE 0.437 and Rank 3.688; Top‑5 achieves sMAPE 0.431 and Rank 3.158. The best single model has Rank 4.845. Full ensemble yields sMAPE 0.445 and Rank 5.062; LogME selection yields sMAPE 0.369 and Rank 4.969.
Dynamic Model Release Scenario
When new models (e.g., Chr.bB, Sun.B) are incrementally added, ZooCast’s Top‑3 sMAPE continuously drops from 0.47 to 0.43 and Rank remains the best, reaching as low as 2.5, demonstrating extensibility of the model zoo.
Scalability at Test Time
Performance improves steadily as integration size K grows; Top‑3 already surpasses the full ensemble and the best single model, confirming efficient scalability.
Efficiency Analysis
Pre‑computation takes 123 s (one‑time). Selection for 97 tasks costs 1 042 s. Total time (pre‑compute + selection + prediction) is 3 123 s, far lower than the 88 024 s required for full‑model forward passes. Computational complexity reduces from O(MN) to O(Mn/U + N).
Conclusion
The One‑Embedding‑Fits‑All paradigm enables fast, accurate zero‑shot model selection from a heterogeneous TSFM zoo, achieving state‑of‑the‑art results on a diverse benchmark while supporting incremental model addition and maintaining single‑model‑level efficiency.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
