CVPR 2025: Semi-Body Digital Humans, Video Upscaling, Mobile Super‑Res

In this CVPR 2025 showcase, Ant Group presents three cutting‑edge papers—EchoMimicV2 introducing an open‑source semi‑body digital human generation framework, RivuletMLP offering an efficient MLP‑based architecture for compressed video quality enhancement, and a quantized super‑resolution model that achieves real‑time 3× upscaling on mobile NPUs.

AntTech
AntTech
AntTech
CVPR 2025: Semi-Body Digital Humans, Video Upscaling, Mobile Super‑Res

CVPR (IEEE/CVF Conference on Computer Vision and Pattern Recognition) is the premier international conference in computer vision. This year 2,878 papers were accepted (22.1% acceptance rate), and Ant Group contributed over 40 papers. This paper showcase highlights three of those works covering digital human generation, video enhancement, and mobile super‑resolution.

Paper 1: EchoMimicV2 – Towards Striking, Simplified, and Semi‑Body Human Animation

EchoMimicV2 is the first open‑source solution for semi‑body digital human generation, gaining 4K stars on GitHub. It addresses challenges such as limited facial portrait focus, heavy multimodal models, instability, and inference latency by introducing an end‑to‑end audio‑driven framework.

Audio‑Pose Dual Harmony (APDH) : a training strategy that jointly coordinates audio and pose conditions while reducing redundant pose information.

Head Partial Attention : leverages “free lunch” data augmentation by seamlessly integrating head‑only photo data to enhance facial expressions without extra modules.

PhD Loss : a multi‑stage loss that strengthens motion representation under incomplete pose conditions and improves details and low‑level visual quality not controlled by audio.

Paper 2: RivuletMLP – An MLP‑Based Architecture for Efficient Compressed Video Quality Enhancement

Compression often leads to spatial texture blur, edge distortion, and temporal motion discontinuity. RivuletMLP introduces a multi‑layer perceptron (MLP)‑based network to tackle these issues.

Dynamic Guided Deformable Alignment (DDA) : adaptively explores and aligns multi‑frame features.

Spatio‑Temporal Feature Flow (SFF) : establishes non‑local dependencies via an innovative feature rearrangement mechanism.

Beneficial Selection Compensation (BSC) : combines deep feature extraction with local region optimization to mitigate inter‑frame motion inconsistency caused by compression.

Experiments demonstrate that RivuletMLP achieves high‑quality reconstruction while maintaining excellent computational efficiency.

Paper 3: Quantized Image Super‑Resolution on Mobile NPUs

This quantized model, which won the CVPR 2025 Mobile AI competition, is optimized for mobile NPUs. It performs 3× image super‑resolution at near‑real‑time speed (≈15 ms) on mobile devices, delivering high‑definition 2K output while remaining fully compatible with mainstream mobile AI accelerators.

Key Highlights

First open‑source semi‑body digital human generation algorithm (EchoMimicV2) with 4K GitHub stars.

RivuletMLP’s novel dynamic alignment and efficient feature extraction dramatically improve compressed video restoration quality and speed.

Mobile super‑resolution solution achieves 15 ms inference for 2K output, setting a new benchmark for on‑device visual experience.

A live session featuring the authors will discuss design ideas and validation processes.

computer visionAIdigital humanCVPRvideo enhancementmobile super-resolution
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.