Industry Insights 23 min read

How AI‑Powered Virtual Live Streaming Cuts Costs and Boosts E‑Commerce Engagement

This article examines the technical challenges of traditional e‑commerce live streaming, presents a 24‑hour AI‑driven virtual live‑broadcast system built by Yanxuan and NetEase Fuxi, and details its architecture, virtual‑human generation, automated content creation, intelligent interaction, and future expansion plans.

NetEase Yanxuan Technology Product Team

Aug 1, 2022

How AI‑Powered Virtual Live Streaming Cuts Costs and Boosts E‑Commerce Engagement

Background

Live streaming has become a crucial sales channel for e‑commerce platforms, but real‑person broadcasts suffer from high labor costs, limited broadcast duration, and scalability issues as the number of stores grows.

Problems with Real‑Person Live Streaming

Cost : Personnel, equipment, and venue expenses increase linearly with the number of live rooms.

Content Limitations : Physical space and product constraints lead to repetitive formats, and human errors can damage brand trust.

Solution Overview

Yanxuan partnered with NetEase Fuxi to develop a fully automated virtual live‑streaming platform that runs 24/7 across multiple channels (Yanxuan APP, Taobao, JD, etc.). The system combines a virtual avatar, AI‑driven dialogue, and real‑time video rendering to deliver continuous product presentations.

System Architecture

The platform is organized into four layers:

Base configuration: defines room layout, script, schedule, and channel.

Content control: decides which content to broadcast, performs task scheduling, data crawling, and intelligent Q&A some outputs are sent directly to the comment layer.

Rendering layer: aggregates all visual elements, applies WebGL rendering, front‑end layout, 3D modeling, speech synthesis, and motion generation to produce the video stream.

Push layer: streams the encoded video to target platforms via RTC push or virtual‑camera pipelines.

Virtual Human Technology

The virtual avatar is integrated through a Unity client or Web SDK. Input text triggers speech synthesis, facial expression, and body motion generation. Four sub‑tasks are addressed: high‑fidelity voice synthesis, expression and lip‑sync generation, semantic‑driven body motion, and temporal alignment of audio‑visual streams.

Automatic Content Generation

To avoid manual script writing for millions of products, the system automates three content types:

Product Title Shortening : an improved transformer‑based summarization model extracts concise titles (≈8 Chinese characters) using dependency parsing and NER rules.

Product Copywriting : a pointer‑generator transformer mixes extracted key points with template text to produce 200‑300‑word scripts; for long‑tail items, rule‑based templates are applied.

Game Materials : sketches from the Quick‑Draw dataset are segmented into connected components, rendered frame‑by‑frame, and assembled into short videos; Text‑to‑Image models (e.g., CogView, DALL·E) are also explored for batch generation.

Intelligent Interaction

The platform reuses Yanxuan’s existing intelligent‑customer‑service stack. Two knowledge bases are maintained:

FAQ‑style business knowledge (standard‑question → similar‑question → answer).

Product‑centric knowledge graph containing attributes, selling points, and promotional information.

When a user posts a comment, the system first performs intent classification, then retrieves relevant knowledge using a dual‑tower matching model (offline vector indexing + online recall & ranking). Low‑quality or policy‑violating responses are filtered by a pretrained fluency model and keyword rules.

Live Control and Deployment

Each broadcast occupies a dedicated instance because video rendering requires a GUI‑enabled OS (e.g., Windows). A task scheduler assigns exclusive instances to live jobs, ensuring no overlap; completed jobs free the instance for the next task.

Future Directions

3D Scenes : replace flat 2D backdrops with fully rendered 3D environments, allowing avatars to walk and interact.

Host Matrix : maintain a pool of diverse virtual avatars (different voices, appearances, and personalities) that can be dynamically assigned based on product category and audience.

Human‑Assistant Hybrid : introduce an AI assistant that answers user questions in real time, supplementing human hosts during high‑traffic streams.

Overall, the virtual live‑streaming solution reduces operational costs, provides uninterrupted coverage, and leverages AI to generate and deliver engaging e‑commerce content across multiple platforms.

e-commerce System Architecture AI Knowledge Base Content Generation virtual live streaming

Written by

NetEase Yanxuan Technology Product Team

The NetEase Yanxuan Technology Product Team shares practical tech insights for the e‑commerce ecosystem. This official channel periodically publishes technical articles, team events, recruitment information, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.