How JD’s TimeHF Billion‑Scale Time‑Series Model Boosts Forecast Accuracy with RLHF

JD’s supply‑chain algorithm team introduces TimeHF, a billion‑scale time‑series large model that leverages RLHF to improve demand forecasting accuracy by over 10%, detailing dataset creation, the PCTLM architecture, a custom RL framework (TPO), and superior benchmark results.

JD Retail Technology
JD Retail Technology
JD Retail Technology
How JD’s TimeHF Billion‑Scale Time‑Series Model Boosts Forecast Accuracy with RLHF

Introduction

Time‑series forecasting is a core technology for supply‑chain management, energy scheduling, and financial risk control. Traditional methods such as ARIMA, Prophet, LSTM, and TCN struggle with complex patterns, long‑range dependencies, and zero‑shot generalization across diverse product categories.

Challenges in Existing Approaches

Insufficient pattern capture: Linear assumptions and local modeling miss long‑term and multivariate interactions.

Weak zero‑sample generalization: Supervised models need retraining for each new scenario.

Poor dataset quality: Public time‑series datasets are limited in size, diversity, and scalability, preventing large‑scale models from demonstrating scaling laws.

Lack of effective RLHF for time series: Existing RLHF frameworks for LLMs do not suit continuous‑value forecasting models.

Dataset Construction (2.1)

The team assembled a 1.5‑billion‑sample high‑quality dataset by mixing JD’s internal sales data, public datasets (Monash, TSLib), and synthetic data. The pipeline includes data cleaning, labeling (length, average sales, zero‑sale ratio), quality filtering, deduplication, diversity ranking, and controlled data‑type ratios (≈76% JD data, 20% synthetic, 4% public).

Model Design – PCTLM (2.2)

PCTLM (Patch Convolutional Time‑Series Large Model) partitions the input series into overlapping patches, projects each patch into a vector, and processes them with a convolution‑based encoder that captures cross‑patch information. A grouped attention mechanism with time‑position encoding (ROPE) reduces computational cost while preserving temporal relationships.

PCTLM architecture diagram
PCTLM architecture diagram

Training Scheme – RLHF for Time Series (2.3)

Standard RLHF frameworks (PPO, RLOO) cannot be applied directly because time‑series models output deterministic values and lack probability distributions. The team created TPO (Timeseries Policy Optimization), a RLHF pipeline that:

Adds paired “good‑vs‑bad” predictions to the RLHF dataset.

Introduces a probability‑output component that models predictions as Gaussian N(μ,1), enabling KL‑divergence computation.

Uses a REINFORCE‑style advantage function based on baseline reward differences, avoiding TD‑error.

Combines a pre‑training loss with MSE to retain forecasting performance while preventing over‑fitting during fine‑tuning.

Experimental Results (3)

On public benchmarks, the PCTLM model fine‑tuned with SFT + TPO outperformed GPT‑4TS and five strong baselines (PatchTST, Autoformer, iTransformer, DLinear, Informer) in MAE, achieving state‑of‑the‑art performance across most datasets.

Benchmark results table
Benchmark results table

Conclusion

The authors present a complete pipeline—PCTLM + SFT + TPO—for training billion‑scale time‑series models. The PCTLM architecture is the first pure time‑series model exceeding one billion parameters, delivering zero‑shot performance superior to GPT‑4TS and traditional supervised models. The custom RLHF framework TPO further improves forecasting accuracy and has already been deployed in JD’s supply‑chain system, automating replenishment for 20,000 SKUs with a notable boost in prediction accuracy.

For more technical details, see the paper “TimeHF: Billion‑Scale Time Series Models Guided by Human Feedback” (https://arxiv.org/abs/2501.15942).

AIlarge language modelsSupply Chaintime series forecastingRLHFPCTLM
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.