How TripCast Uses Masked 2D Transformers to Revolutionize Travel Time-Series Forecasting
TripCast introduces a masked 2D transformer pre‑training framework that treats travel demand as a two‑dimensional time‑series problem, leveraging time‑patch tokenization, dual masking and RevIN normalization to achieve state‑of‑the‑art forecasting performance on massive real‑world travel data.
Introduction
TripCast is a pre‑training framework that applies masked 2‑D transformers to travel‑time‑series forecasting, addressing the inherent “triangular missing” pattern of tourism data.
Why 2‑D Time Series?
Travel demand depends on two orthogonal time dimensions: the event (consumption) time and the leading (booking) time. Stacking daily sales for each departure date yields an H×C matrix with a permanent lower‑right triangular missing region, strong local dependencies, and sparse data for new routes.
TripCast Architecture
TripCast adopts a ViT‑like transformer encoder but introduces time‑patch tokenization, a dual masking strategy (random + progressive), and RevIN normalization tailored for 2‑D series.
Tokenization
The H×C matrix is divided into non‑overlapping patches, each flattened into a token. A linear layer projects raw values into a latent space to create special tokens without ambiguity.
Dual Masking
During pre‑training, random and progressive masks are mixed; during inference only the lower‑right prediction region is masked.
RevIN Normalization
RevIN mitigates distribution shift over time and is adapted for the 2‑D scenario.
Experiments
We collected over 7 billion travel‑booking records from Ctrip, covering sales and search volume. Two evaluation settings were used: in‑domain (train/val/test split on the same dataset) and out‑domain (zero‑shot forecasting on a different dataset).
Baselines included deep learning models (PatchTST, iTransformer, Linear) and pre‑trained large models (OneFitsAll). TripCast‑small consistently outperformed all baselines on MAE and WAPE in‑domain, while TripCast‑base and TripCast‑large surpassed OneFitsAll in out‑domain zero‑shot tests.
Generalizing the Paradigm
The “event axis + leading axis” formulation applies beyond travel, e.g., e‑commerce pre‑sales, media subscription renewals, and GPU cluster scheduling, suggesting TripCast can serve as a generic 2‑D time‑series model.
Conclusion
By focusing on the data characteristics rather than merely scaling model size, TripCast demonstrates that a simple transformer encoder with patch tokenization and progressive masking can achieve state‑of‑the‑art performance on massive real‑world datasets.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
