How 3D Modeling Powers Digital Humans: From 3DMM to NeRF
The article explains what digital humans are, reviews the evolution of 3D modeling techniques—from early 2D hand‑drawn methods and 3DMM to deep‑learning‑based implicit models like NeRF—and discusses current challenges and future research directions.
Digital humans are three‑dimensional virtual characters created by combining computer graphics and artificial‑intelligence techniques to mimic a real person’s appearance, motion, voice and interactive behavior. In 2021, Ranmai Technology released AYAYI, the first domestic ultra‑realistic digital human built on Unreal Engine.
The core pipeline of a digital human consists of modeling, rendering, driving and interaction; this article focuses on the modeling stage and traces its development from early 2D sketching to modern generative AI methods.
Traditional explicit modeling relied on the 3D Morphable Model (3DMM) [1], which treats a face as a linear combination of orthogonal basis shapes derived from many scans, enabling reconstruction of 3D shape from a single image and laying the foundation for 3D avatars.
Later works fused 3DMM with deep learning: one network predicts the 3DMM‑based shape, another predicts texture, and a depth‑completion network fills missing texture regions, after which shape and texture are combined for a full facial reconstruction.
Because training such models requires scarce 3‑D annotated datasets, researchers proposed a 2‑D‑assisted self‑supervised learning (2DASL) approach [3]. It introduces three self‑supervised losses that align projected 3‑D landmarks with original 2‑D points, enforces cycle consistency, and adds a discriminator‑based self‑evaluation to improve model quality.
Recent trends shift toward implicit modeling. Neural Radiance Fields (NeRF) [4] takes multiple images of the same subject from different viewpoints, samples 3‑D positions and view directions, feeds them to an MLP, and outputs color and density for volumetric rendering, producing coarse‑to‑fine novel views.
NeRFace [5] combines 3DMM and NeRF by tracking facial pose and expression with a face‑to‑face method, enriching the MLP input; however, it requires per‑subject training and long training times.
PixelNeRF [6] further reduces data requirements by encoding input images into the NeRF MLP, allowing realistic 3‑D reconstruction from a few or even a single view and achieving better generalization across subjects.
In the outlook, the authors identify four research directions: building universal models that avoid per‑person retraining, enhancing interactive capabilities beyond passive narration, integrating large language models (e.g., via LangChain) for intent‑driven behavior, and adding multimodal perception (vision, audio, sensors) to enrich digital‑human experiences.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Network Intelligence Research Center (NIRC)
NIRC is based on the National Key Laboratory of Network and Switching Technology at Beijing University of Posts and Telecommunications. It has built a technology matrix across four AI domains—intelligent cloud networking, natural language processing, computer vision, and machine learning systems—dedicated to solving real‑world problems, creating top‑tier systems, publishing high‑impact papers, and contributing significantly to the rapid advancement of China's network technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
