How a Billion-Parameter Time Series Model Beats GPT4TS: The PCTLM Breakthrough

This article introduces PCTLM, a pioneering billion‑parameter pure time‑series large model that outperforms existing solutions like GPT4TS across multiple benchmarks, detailing its massive high‑quality dataset, novel patch‑based architecture, and a tailored RLHF framework (TPO) that enhances zero‑shot forecasting accuracy.

JD Cloud Developers
JD Cloud Developers
JD Cloud Developers
How a Billion-Parameter Time Series Model Beats GPT4TS: The PCTLM Breakthrough

1. Introduction

Time series forecasting is a core technology for supply‑chain management, energy scheduling, and financial risk control. Traditional methods (ARIMA, Prophet) and early deep‑learning models (LSTM, TCN) struggle with capturing complex long‑range dependencies, multi‑variable coupling, and zero‑shot generalization across domains.

Recent attempts to adapt large language models (e.g., GPT4TS, TimesFM) to time‑series prediction have not yet achieved breakthrough results due to low‑quality datasets and the lack of effective RLHF pipelines for pure time‑series models.

2. Technical Solution

The JD supply‑chain algorithm team built the industry’s first billion‑parameter pure time‑series model, achieving state‑of‑the‑art results on several public datasets.

2.1 Dataset Construction

A 1.5‑billion‑sample high‑quality dataset was assembled by mixing JD sales data, public datasets (Monash, TSLib), and synthetic data. The construction pipeline includes time‑slice splitting, data ratio balancing, and synthetic data generation.

Base Data

JD dataset: ~1.2 billion samples covering multiple product categories over three years, aggregated across dimensions.

Public datasets: ~20 million samples from Monash and TSLib, expanded via random time‑point slicing.

Synthetic data: ~400 million samples generated by model‑based predictions and custom trend/seasonality/noise components.

Data Cleaning

Labeling: Each series is annotated with length, average sales, zero‑sale ratio, etc.

Quality filtering: Short or highly noisy series are removed.

Deduplication: Series are clustered within random groups; only top‑N per cluster are kept.

Diversity sorting: Batches are reordered to maximize feature diversity.

Data ratio: Final mix – 76 % JD data, 20 % synthetic, 4 % public; 30 % aggregated series, 70 % raw dimensions.

2.2 Model Design – PCTLM

PCTLM (Patch Convolutional Timeseries Large Model) treats the input as overlapping patches, projects each patch via a convolution‑based encoder, and captures cross‑patch interactions with a grouped attention mechanism that incorporates time‑position encodings.

The core Transformer consists solely of an encoder, using RoPE for positional encoding and Grouped Query Attention (GQA) to reduce computational cost.

2.3 Training Scheme – RLHF

Standard RLHF frameworks (PPO, RLOO) are unsuitable for pure time‑series models because their outputs are deterministic and lack probability estimates. We propose TPO (Timeseries Policy Optimization), a RLHF pipeline tailored for time‑series.

Input augmentation: Preference pairs (good vs. bad predictions) are added to the RLHF dataset.

Probability component: Predictions are modeled as Gaussian N(μ,1), enabling KL‑divergence computation.

Advantage function: Inspired by REINFORCE, the advantage is the difference between the model’s reward and a baseline, encouraging predictions closer to the “good” reference.

Time‑series loss: In addition to the RL objective, a weighted MSE term is retained to preserve forecasting accuracy and prevent over‑fitting during fine‑tuning.

3. Model Performance

On public benchmarks, the PCTLM model fine‑tuned with SFT + TPO outperforms GPT4TS and five leading full‑shot time‑series deep‑learning methods (PatchTST, Autoformer, iTransformer, DLinear, Informer). Results are reported in MAE, where lower values indicate better performance.

4. Conclusion

We present a complete training pipeline (PCTLM + SFT + TPO) for time‑series large models. PCTLM is the first billion‑parameter pure time‑series model, achieving superior zero‑shot performance compared to GPT4TS and traditional supervised predictors. The proposed RLHF approach (TPO) outperforms existing RLHF methods in both effectiveness and stability. The model has been deployed in JD’s supply‑chain system, serving 20,000 SKUs with automated replenishment and delivering a significant boost in forecast accuracy.

For more details, see the pre‑print: https://arxiv.org/abs/2501.15942

big datatime series forecastingRLHFPCTLM
JD Cloud Developers
Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.