How Alibaba’s TaoAvatar Brings Real‑Time 3D Digital Humans to Your Phone

TaoAvatar, Alibaba’s new 3D digital‑human platform, enables lifelike, real‑time avatars on mobile and XR devices by combining 3D Gaussian splatting, on‑device AI dialogue, and a lightweight MNN inference engine, and the full source code is now open‑sourced as MNN‑TaoAvatar.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
How Alibaba’s TaoAvatar Brings Real‑Time 3D Digital Humans to Your Phone

TaoAvatar is a 3D real‑time digital‑human technology developed by Alibaba’s Taotian Meta team, capable of rendering realistic avatars and supporting AI‑driven dialogue on smartphones and XR devices.

Open‑Source Release

The application MNN‑TaoAvatar has been open‑sourced on the MNN GitHub repository, allowing developers to download, build, and experiment with the full system.

Key Research Paper

Title: TaoAvatar: Real‑Time Lifelike Full‑Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

arXiv: https://arxiv.org/abs/2503.17032v1

Code: GitHub README

Technical Highlights

TaoAvatar uses advanced 3D Gaussian splatting to generate full‑body avatars from multi‑view video, capturing fine facial expressions, hand gestures, clothing folds, and hair motion with high realism.

Compared with cloud‑based solutions, MNN‑TaoAvatar runs entirely on‑device, eliminating the need for powerful GPUs and reducing latency.

Core Advantages

On‑device Real‑time Dialogue : Optimized ASR (RTF 0.18), LLM (pre‑fill 165 tokens/s, decode 41 tokens/s), and TTS (RTF 0.58) models enable sub‑second response.

On‑device Real‑time Rendering : Efficient pipeline converts speech to facial blend‑shape coefficients (RTF 0.34) and renders 250k‑point clouds at 60 FPS using the NNR renderer.

MNN Engine Modules

MNN‑LLM : Mobile deployment of large language models with model export, quantization (e.g., Qwen2.5‑1.5B reduced to 1.2 GB), KV cache, and LoRA support. Benchmarks on Snapdragon 8 Gen 3 show 8.6×‑20.5× speedup over llama.cpp and fastllm.

Sherpa‑MNN : Optimized ASR/TTS framework delivering up to 2× faster inference and 5× smaller binary than onnxruntime.

MNN‑NNR : Neural‑network‑based renderer with "Dirty" scheduling, GPU‑direct data sharing, and radix sort, achieving 60 FPS with only 200 KB runtime size.

Hardware Requirements

Qualcomm Snapdragon 8 Gen 3 or equivalent CPU

≥ 8 GB RAM

≥ 5 GB storage for model files

Devices not meeting these specs may experience stutter or limited functionality.

Quick Start Guide

git clone https://github.com/alibaba/MNN.git
cd apps/Android/Mnn3dAvatar
./gradlew installDebug

After building, the app can be run on a compatible Android phone to experience real‑time avatar interaction.

Resources

TaoAvatar GitHub: https://github.com/alibaba/MNN/blob/master/apps/Android/MnnTaoAvatar/README_CN.md

Paper: https://arxiv.org/abs/2503.17032v1

MNN‑LLM paper: https://arxiv.org/abs/2506.10443

Model collections, ASR/TTS/NNR models and demo links (see original article for URLs)

mobile AIopen-sourcereal-time rendering3D digital humanMNN inference
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.