How Multi‑Layer Multi‑Frequency Streaming Training Boosts Real‑Time CTR/CVR Prediction
This article details a novel Multi‑Layer Multi‑Frequency streaming training approach that enables minute‑level real‑time updates of massive CTR/CVR models by partitioning weights into freezing embeddings, changing embeddings, and changing weights, demonstrating significant offline and online AUC gains, especially during high‑traffic events like Double 11.
Introduction
Click‑through rate (CTR) and conversion rate (CVR) prediction are critical in e‑commerce applications such as search, recommendation, and online advertising. Their ground‑truth distributions shift dramatically over time due to seasonality, promotions, and large‑scale events, posing two main adaptation strategies: (1) incorporating real‑time features (e.g., user/item live attributes, recent click/exposure sequences) and (2) performing incremental training with multiple updates per day. However, even with streaming data, model deployment latency (typically N ≥ 2 hours) limits responsiveness, causing models to lag behind real‑time changes.
To address this, an online stacking model was added on top of a deep CTR/CVR model, using lightweight, less‑sparse features for rapid updates. Building on this, a Multi‑Layer Multi‑Frequency update method was proposed to directly update the large deep model in real time by splitting weights into three logical groups.
1. Background
Major e‑commerce platforms serve billions of items to hundreds of millions of users. Accurate CTR/CVR estimation is essential for user experience and business outcomes, yet feature and ground‑truth distributions vary widely across days and during major sales events. Figures illustrate the substantial daily and Double 11‑period fluctuations in ground‑truth CTR/CVR.
These variations demand models that can quickly adapt to evolving user interests and item popularity.
2. Streaming Training and Real‑Time Update
Incremental learning—feeding more data continuously—has been shown to steadily improve model performance. Common granularities include:
Daily full‑model checkpoint restore and batch training (day‑level latency).
Hourly real‑time updates using streaming samples while still deploying full‑model checkpoints (hour‑level latency).
Ideal solution: streaming training combined with immediate model updates (minute‑level latency).
2.1 Multi‑Layer Weights
In large‑scale search systems, updating the entire model in real time is infeasible. Model weights consist of embeddings (highly sparse ID features) and MLP layers. For example, 800 M item IDs with 64‑dimensional embeddings occupy ~190 GB, stored as tables and cannot be updated instantly.
We therefore categorize weights into:
Freezing embeddings : high‑sparse IDs (user_id, query_id, item_id) that change slowly (stable for up to 7 days, rapid only during promotions).
Changing embeddings : less sparse embeddings such as user profile, brand, and statistical feature embeddings.
Changing weights : all MLP parameters.
The architecture is illustrated below.
2.2 Multi‑Frequency Update
The Multi‑Frequency strategy updates freezing embeddings via full‑model checkpoint swaps, while changing embeddings and changing weights are updated in real time. Two operational modes are defined:
Daily mode : one full‑model update per day, with continuous real‑time updates in‑day.
Promotion mode : alternating dual‑model updates during high‑traffic events.
During promotion, two parallel training tasks (Model A and Model B) are launched. Model A performs full‑model streaming training, while Model B fixes high‑sparse embeddings and streams the remaining parameters. The workflow proceeds as follows:
Both models restore the same checkpoint at start time T.
At T + 1 minute, Model B begins swift updates of its mutable parameters to the serving system.
Model B continues updating every minute.
After N hours, Model A accumulates N hours of training; a full checkpoint is generated, and Model B’s updates are paused.
Model A switches to the new checkpoint (≈30 minutes), then Model B restores Model A’s checkpoint and resumes streaming.
2.3 Practical Experience
2.3.1 Streaming CTR/CVR Co‑Train
Streaming co‑training consumes both CTR and CVR sample streams. Unlike offline co‑training with fixed sample ratios, real‑time training requires dynamic sampling: a task proceeds only if sufficient samples are available, otherwise the other task is executed, preventing training stalls.
2.3.2 No Time Feature
Including time features (day‑of‑week, hour‑of‑day) in real‑time training can cause over‑fitting because the training data often originates from a single time slot, reducing generalization to other times.
2.3.3 SyncSendSwiftHook
A synchronization hook ensures consistency between the chief worker sending weight updates and the training workers. The chief sets a flag sending=True before dispatch; workers check this flag and pause until the flag is cleared.
2.3.4 Sending Mean and Variance
Batch‑norm statistics (mean and variance) must be updated during streaming training, as they are not trainable variables but are crucial for model correctness.
3. Experiments and Analysis
3.1 Comparison Methods
Base: full model with all weights frozen.
ALL‑EMB+MLP: all parameters updated.
MLP: only MLP layers updated.
Part‑EMB+MLP: partial embeddings plus MLP updated.
3.2 Offline Daily Gains
Models were trained on day T data (Base) and incrementally updated with day T+1 data. Performance was evaluated on day T+2. Results show that ALL‑EMB+MLP improves next‑day CTR AUC by 0.16 % and CVR AUC by 0.18 %; MLP captures the full gain on total AUC and 50 % on PV‑AUC; Part‑EMB+MLP matches ALL‑EMB+MLP gains.
3.3 Real‑Time Gains on Double 11
During the Double 11 shopping festival, a 30‑minute online AUC window was used to measure real‑time improvements. Key observations:
8 am: real‑time large model launched (partial embeddings + hidden layers) → CTR AUC +0.6 %, CVR AUC +0.8 %.
1 pm: full model update (all embeddings + hidden layers) → additional CTR AUC +0.9 %, CVR AUC +2 %.
5:30 pm: real‑time updates stopped → CTR AUC remains +0.5 %, CVR gain negligible, indicating higher real‑time sensitivity for CVR.
4. Future Outlook
Planned directions include making streaming training + real‑time updates a routine practice, enabling online model tuning to replace offline batch cycles, potentially removing the need for separate online stacking models, and strengthening end‑to‑end monitoring and reliability to meet production‑grade online system standards.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
