Artificial Intelligence 18 min read

TAR: Multi‑Scale Trajectory Model Fixes Granularity Mismatch, Raising CTR >12%

The paper introduces the Trajectory Auto‑Regressive (TAR) model, which uses multi‑scale trajectory generation, a VQ‑VAE latent compression, and a state‑action fusion architecture to address granularity mismatch between fine‑grained decision steps and coarse‑grained feedback in online advertising, achieving over 12% CTR lift, smoother budget pacing, and faster inference compared to prior baselines.

Alimama Tech

May 28, 2026

TAR: Multi‑Scale Trajectory Model Fixes Granularity Mismatch, Raising CTR >12%

Introduction

Online advertising can be viewed as a large‑scale real‑time resource allocation problem. Both bidding‑based and contract‑based promotion tasks can be unified as a sequential decision‑making problem where an agent observes the current state (remaining budget, market competition, inventory, etc.) and outputs control actions (bid multiplier λ or exposure probability α) to maximize long‑term performance.

Recent reinforcement‑learning (RL) approaches model this as a Markov decision process (MDP), but the tasks exhibit intrinsic non‑Markovian characteristics: rewards are sparse and feedback is delayed, creating a structural “granularity mismatch” between fine‑grained decision intervals and coarse‑grained observable feedback.

Core Problem: Granularity Mismatch

The authors formalize the mismatch by defining a decision interval Δt and a reliable feedback interval τ, where typically Δt ≪ τ. This leads to sparse reward signals (most fine‑grained steps have zero conversions) and delayed feedback (pre‑loading causes exposure to occur many steps after allocation).

Aggregating feedback over coarse‑grained windows naturally densifies rewards and absorbs delay dispersion, but operating solely at coarse granularity sacrifices control precision. The solution is multi‑scale modeling that leverages the stability of coarse signals while retaining fine‑grained decision flexibility.

Method: Trajectory Auto‑Regressive Model (TAR)

TAR consists of three key techniques:

Coarse‑to‑fine trajectory generation : First plan an overall trajectory at a coarse time scale, then progressively refine it to fine‑grained actions.

Multi‑scale VQ‑VAE latent compression : A vector‑quantized variational auto‑encoder learns a unified latent space that automatically compresses heterogeneous features (cumulative conversions, budget snapshots, sliding‑window CTR) across scales.

State‑action fusion architecture : Historical actions are embedded directly into the state representation, eliminating the need for a separate inverse dynamics model and capturing long‑range dependencies.

Conditional Generative Framework

TAR follows the AIGB paradigm, converting sequential decision making into a conditional generation problem optimized by maximum likelihood estimation (MLE). Given a trajectory‑level attribute (e.g., total conversion value or contract exposure) and historical observations, the model learns the conditional probability of generating the full state sequence.

Multi‑Scale Trajectory Generation

The model defines K time scales from the coarsest to the original fine resolution. The joint distribution is factorized into an autoregressive product across scales, where each finer scale conditions on all previously generated coarser scales. This mirrors a planning process: a coarse “blueprint” of the overall budget consumption curve is first generated, then medium‑scale corrections (hour‑level fluctuations) are added, and finally fine‑grained actions fill in the details.

Multi‑Scale VQ‑VAE Compression

Encoding: a Transformer encoder maps the raw trajectory to a continuous latent sequence. Multi‑scale residual quantization then iteratively (1) linearly interpolates the residual to the target length, (2) looks up the nearest codebook entry, (3) up‑samples the code vector back to the original length, and (4) refines it with a self‑attention layer, updating the cumulative quantized representation and residual.

Decoding: a Transformer decoder reconstructs the trajectory from the final accumulated representation. The training loss combines reconstruction error with the standard VQ‑VAE loss.

State‑Action Fusion with Position Offset

Instead of decoupling state generation and action inference (as in DiffBid or Diffuser), TAR augments each state with the previous action, forming an augmented state vector. The model directly generates this augmented trajectory, from which actions are extracted during inference. This design ensures end‑to‑end learning, multi‑step perception, and cross‑scale action guidance.

Scale‑Level Causal Transformer

The backbone uses a scale‑level causal Transformer. For scale k, inputs include a learnable parameter (the coarse‑scale sketch) and interpolated representations of all preceding scales. A triangular attention mask enforces causality, allowing scale k to attend only to earlier scales and itself. Conditional information and attributes are injected via AdaLN and cross‑attention. The output logits predict codebook indices, trained with cross‑entropy and sampled with top‑p during inference.

Experiments

Offline Simulation: Sparse Reward Scenario

Evaluated on AuctionNet‑Sparse (NeurIPS AIGB competition dataset, 7 campaigns, 48 steps/day, ~500k training trajectories). TAR outperformed all baselines across four budget levels, achieving 2.7%–13.5% improvement. Gains were largest at low budgets (8.4%–13.5%) where sparsity and granularity mismatch are most severe.

Offline Simulation: Delayed Feedback Scenario

A custom DelaySim environment was built from real brand‑promotion logs (14 campaigns, 288 steps/day, ~200k trajectories). TAR improved penalty×avg_pCTR by 10.1%–17.3% across budgets, surpassing DiffBid, which struggled due to its inverse‑dynamics model’s limited temporal horizon.

Ablation Study

Removing coarse‑to‑fine generation (CTF) caused a 41% drop on DelaySim, confirming the importance of multi‑scale information hierarchy.

Replacing VQ‑VAE with linear interpolation reduced performance by 42% on AuctionNet‑Sparse, highlighting the need for adaptive latent compression.

Replacing the state‑action integration (SAI) with an inverse dynamics model degraded results on DelaySim by over 30%, demonstrating the necessity of multi‑step perception.

Inference Efficiency

Per‑step latency: DT 2.6 ms, TAR 16.7 ms, DiffBid 234.4 ms. TAR is ~14× faster than DiffBid and comparable to DT, thanks to its discrete token autoregressive generation (K = 4 scales, one forward pass per scale) versus iterative denoising in diffusion models.

Online A/B Test

Deployed for 8 days against a PPO‑based RL baseline. In CTR‑optimizing campaigns, TAR increased CTR by 10%; in CVR‑optimizing campaigns, CVR rose by 5%.

Technical Insights and Future Directions

The work demonstrates that granularity mismatch is a fundamental challenge for any industrial optimization problem where decision frequency far exceeds feedback frequency (e.g., inventory scheduling, dynamic pricing, network traffic control). Multi‑scale generative modeling aligns planning resolution with feedback dynamics, offering a unified framework for both bidding and budget pacing.

Engineeringly, TAR achieves state‑of‑the‑art performance while maintaining inference latency suitable for real‑time bidding (<100 ms). The discrete token generation paradigm appears more appropriate for online decision systems than diffusion‑based iterative methods.

Future work includes guided generation for dynamic trajectory attributes, confidence estimation heads for adaptive decisions, DPO fine‑tuning for preference alignment, and scaling up model size. The authors also plan to open‑source the DelaySim dataset and simulation environment to foster further research.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

reinforcement learning Online Advertising Budget Pacing VQ-VAE Multi-Scale Generation Trajectory Modeling

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.