CVPR 2025: Semi-Body Digital Humans, Video Upscaling, Mobile Super‑Res
In this CVPR 2025 showcase, Ant Group presents three cutting‑edge papers—EchoMimicV2 introducing an open‑source semi‑body digital human generation framework, RivuletMLP offering an efficient MLP‑based architecture for compressed video quality enhancement, and a quantized super‑resolution model that achieves real‑time 3× upscaling on mobile NPUs.
CVPR (IEEE/CVF Conference on Computer Vision and Pattern Recognition) is the premier international conference in computer vision. This year 2,878 papers were accepted (22.1% acceptance rate), and Ant Group contributed over 40 papers. This paper showcase highlights three of those works covering digital human generation, video enhancement, and mobile super‑resolution.
Paper 1: EchoMimicV2 – Towards Striking, Simplified, and Semi‑Body Human Animation
EchoMimicV2 is the first open‑source solution for semi‑body digital human generation, gaining 4K stars on GitHub. It addresses challenges such as limited facial portrait focus, heavy multimodal models, instability, and inference latency by introducing an end‑to‑end audio‑driven framework.
Audio‑Pose Dual Harmony (APDH) : a training strategy that jointly coordinates audio and pose conditions while reducing redundant pose information.
Head Partial Attention : leverages “free lunch” data augmentation by seamlessly integrating head‑only photo data to enhance facial expressions without extra modules.
PhD Loss : a multi‑stage loss that strengthens motion representation under incomplete pose conditions and improves details and low‑level visual quality not controlled by audio.
Paper 2: RivuletMLP – An MLP‑Based Architecture for Efficient Compressed Video Quality Enhancement
Compression often leads to spatial texture blur, edge distortion, and temporal motion discontinuity. RivuletMLP introduces a multi‑layer perceptron (MLP)‑based network to tackle these issues.
Dynamic Guided Deformable Alignment (DDA) : adaptively explores and aligns multi‑frame features.
Spatio‑Temporal Feature Flow (SFF) : establishes non‑local dependencies via an innovative feature rearrangement mechanism.
Beneficial Selection Compensation (BSC) : combines deep feature extraction with local region optimization to mitigate inter‑frame motion inconsistency caused by compression.
Experiments demonstrate that RivuletMLP achieves high‑quality reconstruction while maintaining excellent computational efficiency.
Paper 3: Quantized Image Super‑Resolution on Mobile NPUs
This quantized model, which won the CVPR 2025 Mobile AI competition, is optimized for mobile NPUs. It performs 3× image super‑resolution at near‑real‑time speed (≈15 ms) on mobile devices, delivering high‑definition 2K output while remaining fully compatible with mainstream mobile AI accelerators.
Key Highlights
First open‑source semi‑body digital human generation algorithm (EchoMimicV2) with 4K GitHub stars.
RivuletMLP’s novel dynamic alignment and efficient feature extraction dramatically improve compressed video restoration quality and speed.
Mobile super‑resolution solution achieves 15 ms inference for 2K output, setting a new benchmark for on‑device visual experience.
A live session featuring the authors will discuss design ideas and validation processes.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
