Artificial Intelligence 15 min read

How Alibaba’s TaoAvatar Brings Real‑Time 3D Digital Humans to Your Phone

TaoAvatar, Alibaba’s new 3D digital‑human platform, enables lifelike, real‑time avatars on mobile and XR devices by combining 3D Gaussian splatting, on‑device AI dialogue, and a lightweight MNN inference engine, and the full source code is now open‑sourced as MNN‑TaoAvatar.

DaTaobao Tech

Jul 7, 2025

How Alibaba’s TaoAvatar Brings Real‑Time 3D Digital Humans to Your Phone

TaoAvatar is a 3D real‑time digital‑human technology developed by Alibaba’s Taotian Meta team, capable of rendering realistic avatars and supporting AI‑driven dialogue on smartphones and XR devices.

Open‑Source Release

The application MNN‑TaoAvatar has been open‑sourced on the MNN GitHub repository, allowing developers to download, build, and experiment with the full system.

Key Research Paper

Title: TaoAvatar: Real‑Time Lifelike Full‑Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

arXiv: https://arxiv.org/abs/2503.17032v1

Code: GitHub README

Technical Highlights

TaoAvatar uses advanced 3D Gaussian splatting to generate full‑body avatars from multi‑view video, capturing fine facial expressions, hand gestures, clothing folds, and hair motion with high realism.

Compared with cloud‑based solutions, MNN‑TaoAvatar runs entirely on‑device, eliminating the need for powerful GPUs and reducing latency.

Core Advantages

On‑device Real‑time Dialogue : Optimized ASR (RTF 0.18), LLM (pre‑fill 165 tokens/s, decode 41 tokens/s), and TTS (RTF 0.58) models enable sub‑second response.

On‑device Real‑time Rendering : Efficient pipeline converts speech to facial blend‑shape coefficients (RTF 0.34) and renders 250k‑point clouds at 60 FPS using the NNR renderer.

MNN Engine Modules

MNN‑LLM : Mobile deployment of large language models with model export, quantization (e.g., Qwen2.5‑1.5B reduced to 1.2 GB), KV cache, and LoRA support. Benchmarks on Snapdragon 8 Gen 3 show 8.6×‑20.5× speedup over llama.cpp and fastllm.

Sherpa‑MNN : Optimized ASR/TTS framework delivering up to 2× faster inference and 5× smaller binary than onnxruntime.

MNN‑NNR : Neural‑network‑based renderer with "Dirty" scheduling, GPU‑direct data sharing, and radix sort, achieving 60 FPS with only 200 KB runtime size.

Hardware Requirements

Qualcomm Snapdragon 8 Gen 3 or equivalent CPU

≥ 8 GB RAM

≥ 5 GB storage for model files

Devices not meeting these specs may experience stutter or limited functionality.

Quick Start Guide

git clone https://github.com/alibaba/MNN.git
cd apps/Android/Mnn3dAvatar
./gradlew installDebug

After building, the app can be run on a compatible Android phone to experience real‑time avatar interaction.

Resources

TaoAvatar GitHub: https://github.com/alibaba/MNN/blob/master/apps/Android/MnnTaoAvatar/README_CN.md

Paper: https://arxiv.org/abs/2503.17032v1

MNN‑LLM paper: https://arxiv.org/abs/2506.10443

Model collections, ASR/TTS/NNR models and demo links (see original article for URLs)

mobile AI open-source real-time rendering 3D digital human MNN inference

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.