How Vivo’s BlueHeart AI Assistant Optimizes Post‑Conversation Recommendations with LLMs
In a detailed interview, Vivo AI engineer Liang Tianan explains how the BlueHeart Small V assistant leverages large language models, multi‑stage recall, ranking, and reward‑model fine‑tuning (SFT/DPO) to generate high‑quality, diverse post‑dialogue recommendation items while balancing latency, cost, and evaluation challenges.
In 2023, Vivo launched the BlueHeart Small V AI assistant and upgraded it in June 2025 to DeepSeek‑R1, advancing AI capabilities and providing precise, diverse personalized content recommendations.
At the DA Digital Intelligence Technology Conference in Shenzhen (July 25‑26), Vivo AI department algorithm engineer Liang Tianan was invited to share optimization practices for BlueHeart Small V’s recommendation scenarios, focusing on improving user experience, dialogue turns, and integrating large‑model‑based content generation and evaluation.
DataFun: What is the core technical challenge of post‑question recommendation compared to traditional feed or e‑commerce recommendation? Liang: The main challenge is generating high‑quality recommendation items in real time, as the system must create items from dialogue context rather than selecting from a pre‑existing pool.
Traditional recommendation evaluates clicks and conversions, but post‑dialogue recommendation requires assessing relevance, usefulness, and diversity, which are more subjective.
DataFun: Could you briefly describe the original recall and ranking modules of BlueHeart Small V and their bottlenecks? Liang: Initially, a single large model generated recommendations directly, leading to instability, lack of diversity, and no internal ranking mechanism.
The single model’s output was unstable and prone to semantic bias, limiting diversity.
Absence of an internal ranking prevented prioritization of candidates.
To improve quality and diversity, a multi‑stage pipeline with recall, ranking, and re‑ranking modules was introduced.
DataFun: How are LLMs used in the recommendation pipeline? Liang: LLMs are integrated into recall, ranking, and offline evaluation. In recall, multiple parallel LLM paths generate diverse candidate items based on dialogue semantics. In ranking, LLMs are fine‑tuned to predict CTR scores, replacing traditional models that struggle with contextual understanding. In offline evaluation, a reward model (LLM) scores generated items for relevance, diversity, effectiveness, and safety.
DataFun: How do you balance recommendation quality with resource consumption on mobile devices? Liang: Model quantization to INT8, Prefix Cache, and KV‑Cache (including CPU‑based KV‑Cache) are employed to reduce latency and memory usage while maintaining performance.
DataFun: What are the biggest technical obstacles for deep LLM integration in dialogue‑based recommendation? Liang: (1) Incomplete evaluation systems – current offline metrics rely heavily on reward‑model scores, which may not align with online performance. (2) Balancing diversity and relevance – ensuring recommendations are both varied and contextually appropriate.
DataFun: What is the future potential of multimodal models on mobile? Liang: Multimodal LLMs can align visual, sensor, and textual signals to enable proactive, context‑aware services such as intelligent travel assistance and "one‑tap screen query" for restaurant recommendations.
Guest Introduction Liang Tianan is an algorithm engineer in Vivo’s AI department, holding a master’s degree from Huazhong University of Science and Technology, and focuses on recommendation algorithm optimization for the BlueHeart Small V assistant.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.