How Generative AI is Transforming Recommendation: A Deep Dive into DeWu’s Recall System
This article analyzes DeWu's generative recall system, detailing its background, technical design of the Generative and Rerank models, inference workflow, experimental gains in core consumption and diversity metrics, and future engineering directions such as framework migration, LLM integration, and multimodal generation.
Background
Traditional recommendation pipelines suffer from information cocoons, interest convergence, and content homogenization, which reduce user freshness and satisfaction as the feedback loop reinforces a few dominant interests.
Rapid advances in generative AI present an opportunity to shift from discriminative matching to predictive generation, especially for platforms like DeWu where users demand diverse and novel content.
Limitations of Traditional Recall
Insufficient temporal modeling: Long‑term and short‑term user interests are not captured effectively.
Restricted interest diversity: Matching historical behavior tends to converge on a few high‑frequency interests.
Matching paradigm ceiling: Discriminative models cannot predict future, latent interests.
Weak interest fusion: Separate interest vectors lack end‑to‑end collaborative modeling.
Advantages of Generative Recall
Next‑Token Prediction paradigm: Predicts the next item a user may click, enabling end‑to‑end interest fusion.
Guided recall mechanism: Provides controllable, structured recall conditions aligned with business goals.
Temporal dependency modeling: Transformer architecture naturally captures sequential dependencies.
Interest prediction capability: Goes beyond known interests to forecast potential user directions.
End‑to‑end optimization: Generates recall results directly from user behavior sequences, reducing information loss.
Scaling‑law behavior: Larger models and data improve expressive power and online recommendation performance.
Technical Solution
Generative Model Design
The generative model is built on a Transformer decoder for next‑token generation. Key features include:
Main sequence features: User image/video click sequences and first/second/third‑level category sequences, truncated to the most recent 100 actions.
First‑position User Token strategy: Jointly trained with a twin‑tower model to produce a user_token, isolated via gradient blocking to keep generation and twin‑tower objectives independent.
Model parameters: Configured to the maximum size supported by the DeepRec framework (n_layers=3, n_heads=4, dim=64) with positional embeddings for stronger temporal modeling.
Rerank Model Design
The rerank model is co‑trained with the generative model through multi‑task learning, sharing low‑level feature representations.
Joint training mechanism: Simultaneously trains the item tower and user tower of the rerank model with the generative model.
Gradient balance: Carefully weighted loss terms ensure collaborative optimization of generation and ranking tasks.
Inference Process: From Category Generation to Precise Recall
Online inference follows four steps: generate → vectorize → retrieve → rerank.
Category generation: The decoder produces the top‑K first‑level categories (K=4 after offline recall@100 search). These categories become hard‑condition vectors for downstream interest modeling.
Multi‑interest vector construction: Each generated category feeds a conditional twin‑tower user_tower to obtain K image‑based and K video‑based interest vectors, decoupling interests per category.
ANN retrieval & rerank: Each interest vector performs ANN search, retrieving candidate items that are then scored by the rerank model. A serpentine merge strategy fuses multiple interest channels into the final recall list.
Experimental Results
AB tests on DeWu’s community showed statistically significant improvements across core consumption and diversity metrics.
Core consumption gains: +0.41% average effective VV per recommendation, +0.37% increase in average DAU session length, +0.45% longer average recommendation time, +0.39% higher exposure‑VV per user.
Diversity improvements: +0.18% average number of clicked first‑level categories, +0.23% average clicked third‑level categories, +0.19% average exposed third‑level categories.
Future Engineering Optimizations
Framework migration: Move from DeepRec to DeepSea‑Torch to support larger parameter scales and sparse features.
Architecture upgrade: Explore One‑Rec framework to unify generative and discriminative recall paradigms.
Inference acceleration: Research model compression and quantization to reduce latency.
Cost optimization: Refine training strategies and resource scheduling to lower cost per effect.
Model Capability Upgrades
Increasing model size and sparse feature support will enable richer generative architectures. Planned directions include extending the context window beyond 100 actions and experimenting with sparse or linear attention mechanisms.
Potential LLM Integration
Combining large language models with generative recall can inject world knowledge, helping to surface latent user interests that are not explicitly expressed.
Multimodal & Cross‑Domain Generation
Leveraging multimodal signals (images, videos) can produce richer interest representations, and cross‑domain generation can translate content interests into e‑commerce product recommendations, enhancing business synergy.
Conclusion & Outlook
The first‑phase practice demonstrates that a "generate‑predict + guided‑recall" pipeline can simultaneously boost consumption depth and interest breadth at controllable cost, validating the feasibility of generative recall in industrial recommendation scenarios. Future work will refine token granularity, incorporate item embeddings, and continue scaling the architecture.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
