Artificial Intelligence 12 min read

Large Models Boost Douyin User Experience: Expert Insights

In an interview at the DA Digital Intelligence Conference, ByteDance AI specialist Cai Conghuai explains how large language models, combined with techniques like SFT, DPO, and RAG, are reshaping Douyin's user‑experience signal detection, root‑cause analysis, and evaluation, while outlining future AI‑agent breakthroughs.

DataFunTalk

Jun 29, 2025

Large Models Boost Douyin User Experience: Expert Insights

Douyin, with hundreds of millions of daily active users, faces huge challenges in delivering optimal user experience. At the DA Digital Intelligence Technology Conference in Shenzhen (July 25‑26), ByteDance algorithm expert Cai Conghuai discussed how large models empower Douyin’s experience intelligence.

01 Technical Solution and Implementation Details

DataFun: Douyin user experience faces many challenges; can you explain the limitations of traditional algorithms in scenarios like video recommendation and comment interaction, and what new solutions large models provide?

Cai Conghuai: When perceiving experience signals from video and comment channels, we encounter complex multimodal features such as video titles, frames, user profiles, and comment content. Traditional algorithms require massive data to learn deep semantic features, resulting in low ROI. Large models possess strong semantic understanding and multimodal processing abilities, achieving good performance even with zero‑ or few‑shot data, thus expanding video and comment experience signal channels at low cost.

DataFun: From Tencent to ByteDance, which core methodologies have you transferred to the new “large‑model‑driven experience intelligence” direction, and what mindsets or technical habits need to be broken?

Cai Conghuai: Problem definition, data analysis, model selection, training optimization, result evaluation, and iterative improvement are universal. In problem definition, we must break the habit of converting many business problems into purely discriminative tasks; generative approaches can also solve them effectively in the era of large models.

DataFun: In the “experience signal recognition” stage, how can large models detect user‑experience problems earlier? Are there specific technical designs for multimodal signal fusion, such as RAG?

Cai Conghuai: Strong signals like offline feedback, online客服, and reports are often lagging. By analyzing user behavior—e.g., semantic analysis of comments on submitted videos—we can identify unreasonable sticky words using multimodal information before users report issues. For billion‑scale submissions, we use a hierarchical solution combining traditional and large models, and RAG technology to improve large‑model accuracy.

DataFun: How do you quantify “quality score” and “semantic viewpoint” in experience signal understanding? Are large models complementary or a replacement for traditional scoring models, and what labeling and training strategies are needed (e.g., DPO)?

Cai Conghuai: We define core business metrics and conduct manual sampling to evaluate semantic viewpoint accuracy, duplication, and missing rates; quality scores are measured similarly. In moderate‑scale scenarios, large models can replace traditional models, but for massive scale we adopt a layered approach. Training uses full‑parameter fine‑tuning and reinforcement fine‑tuning, with data labels covering categories and preference alignment.

02 Evaluation and Challenges

DataFun: In root‑cause analysis, how do large models balance diagnostic accuracy and explainability? Do you incorporate knowledge graphs or causal reasoning, and can you share a successful case?

Cai Conghuai: We combine business data—user profiles, behavior logs, A/B experiment details, client release info—to balance accuracy and explainability. Root‑cause analysis often reduces to matching problems between abnormal feedback spikes and experiments or releases. For example, early 2025 many users complained about the “Douyin spark tag” feature; large‑model diagnosis quickly identified the formal experiment as the cause.

DataFun: How do you adapt SFT, DPO, and RAG to Douyin’s business characteristics such as model lightweighting, real‑time requirements, and data security?

Cai Conghuai: We fine‑tune a 7B base model using SFT, then apply distillation and quantization to reduce resource consumption. DPO aligns the model with business preferences for summarization and semantic viewpoint tasks. RAG retrieves official knowledge bases and business rules to reduce reliance on massive parametric knowledge, ensuring compliance and efficiency.

DataFun: Large‑model outputs can be subjective; how do you design a scientific evaluation system? Do you use user research, A/B testing, or cross‑validation?

Cai Conghuai: Currently we rely on expert evaluation, breaking subjective issues into multiple dimensions to avoid vague judgments. Future work will incorporate user surveys, A/B tests, and annotated knowledge bases for a more rigorous assessment.

DataFun: How do you address hallucination, long‑tail coverage, and other challenges in deployment? Any data‑augmentation or feedback loops?

Cai Conghuai: We filter feedback data with a quality‑score model and add manual checks to ensure data quality. During inference, RAG retrieves high‑quality knowledge from a curated repository to constrain generation. For long‑tail issues, we build dedicated recall and recognition pipelines for extreme, high‑risk feedback categories.

03 Value and Industry Insights

DataFun: Which quantifiable metrics have improved thanks to this solution (e.g., retention, complaint rate), and is the approach applicable to other content platforms?

Cai Conghuai: Core metrics include reduced negative feedback volume, increased positive user satisfaction, higher problem‑resolution rate, and improved service efficiency. These challenges are common across content platforms, so the technical solutions can serve as a reference.

DataFun: What are the core breakthrough directions for large models in user experience over the next three years, both technically (e.g., multimodal generalization) and business‑wise (e.g., personalization vs. privacy)?

Cai Conghuai: The key breakthrough lies in AI‑Agent evolution—from simple chat to intelligent chat, issue recognition, analysis, and root‑cause diagnosis. Technically, we need work on agent architecture, data collection, model fine‑tuning, and compression. Business‑wise, establishing privacy management, standardizing knowledge bases, and fostering cross‑business collaboration are essential.

Guest Introduction: Cai Conghuai, ByteDance algorithm expert, holds a master's degree from Harbin Institute of Technology, previously worked at Tencent and ByteDance, with extensive experience in AI algorithms for content understanding and experience intelligence.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

User experience AI large language models RAG Multimodal SFT DPO

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.