How TripCast Uses Masked 2D Transformers to Revolutionize Travel Time-Series Forecasting

TripCast introduces a masked 2D transformer pre‑training framework that treats travel demand as a two‑dimensional time‑series problem, leveraging time‑patch tokenization, dual masking and RevIN normalization to achieve state‑of‑the‑art forecasting performance on massive real‑world travel data.

Ctrip Technology
Ctrip Technology
Ctrip Technology
How TripCast Uses Masked 2D Transformers to Revolutionize Travel Time-Series Forecasting

Introduction

TripCast is a pre‑training framework that applies masked 2‑D transformers to travel‑time‑series forecasting, addressing the inherent “triangular missing” pattern of tourism data.

Why 2‑D Time Series?

Travel demand depends on two orthogonal time dimensions: the event (consumption) time and the leading (booking) time. Stacking daily sales for each departure date yields an H×C matrix with a permanent lower‑right triangular missing region, strong local dependencies, and sparse data for new routes.

TripCast Architecture

TripCast adopts a ViT‑like transformer encoder but introduces time‑patch tokenization, a dual masking strategy (random + progressive), and RevIN normalization tailored for 2‑D series.

Tokenization

The H×C matrix is divided into non‑overlapping patches, each flattened into a token. A linear layer projects raw values into a latent space to create special tokens without ambiguity.

Dual Masking

During pre‑training, random and progressive masks are mixed; during inference only the lower‑right prediction region is masked.

RevIN Normalization

RevIN mitigates distribution shift over time and is adapted for the 2‑D scenario.

Experiments

We collected over 7 billion travel‑booking records from Ctrip, covering sales and search volume. Two evaluation settings were used: in‑domain (train/val/test split on the same dataset) and out‑domain (zero‑shot forecasting on a different dataset).

Baselines included deep learning models (PatchTST, iTransformer, Linear) and pre‑trained large models (OneFitsAll). TripCast‑small consistently outperformed all baselines on MAE and WAPE in‑domain, while TripCast‑base and TripCast‑large surpassed OneFitsAll in out‑domain zero‑shot tests.

Generalizing the Paradigm

The “event axis + leading axis” formulation applies beyond travel, e.g., e‑commerce pre‑sales, media subscription renewals, and GPU cluster scheduling, suggesting TripCast can serve as a generic 2‑D time‑series model.

Conclusion

By focusing on the data characteristics rather than merely scaling model size, TripCast demonstrates that a simple transformer encoder with patch tokenization and progressive masking can achieve state‑of‑the‑art performance on massive real‑world datasets.

Artificial Intelligencetime series forecastingpretrainingtravel data2D transformermasked transformer
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.