DaTaobao Tech
Jan 30, 2026 · Artificial Intelligence
Human‑like LLM Replies for Live Digital Hosts: ASR‑Based Style Transfer and Reward Modeling
This article proposes an ASR‑driven pipeline that creates high‑quality AI‑reply vs. human‑like reply pairs, trains a rewrite model and a reward model, and uses GRPO reinforcement learning to generate natural, helpful, and less AI‑sounding responses in digital‑human live streaming, achieving 92% accuracy and 97% helpfulness while improving user experience.
ASR dataLLMQwen
0 likes · 20 min read
