Artificial Intelligence 14 min read

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

This article introduces VMBench, the first perception‑aligned video motion generation benchmark that defines a five‑dimensional metric suite and a meta‑guided prompt generation pipeline, and presents LD‑RPS, a zero‑shot unified image restoration framework based on latent diffusion recurrent posterior sampling, together with extensive experiments validating both systems.

Amap Tech

Jul 9, 2025

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

Conference Overview

ICCV (International Conference on Computer Vision) is a top‑tier international conference in computer vision, with a 24% acceptance rate; five papers from the Gaode technology team were accepted.

VMBench: Perception‑Aligned Video Motion Benchmark

VMBench is the first benchmark that aligns video motion quality evaluation with human perception. It builds a five‑dimensional Perception‑Aligned Motion Metrics (PMM) system—CAS, MSS, OIS, PAS, and TCS—covering six natural motion patterns and provides a large meta‑information‑guided prompt generation (MMPG) framework.

Research Background

Existing evaluation methods suffer from two main issues: (1) metrics are detached from human perception, failing to capture smoothness, physical plausibility, and object integrity; (2) prompt libraries are limited, restricting the assessment of diverse dynamic scenes.

Paper Highlights

Perception‑Aligned Metric Suite (PMM) : includes Common‑sense Alignment Score (CAS), Motion Smoothness Score (MSS), Object Integrity Score (OIS), Perceptible Amplitude Score (PAS), and Temporal Consistency Score (TCS).

Meta‑Guided Motion Prompt Generation (MMPG) : extracts subject, place, and action triples from multiple video datasets, optimizes prompts with large language models, and validates them through human‑LLM collaboration, yielding 1,050 high‑quality prompts.

Experimental Results

Human Perception Alignment : Spearman correlation analysis with 1,200 expert‑rated videos shows PMM outperforms rule‑based and multimodal large‑model baselines across all dimensions.

Ablation Studies : Removing any PMM component degrades overall accuracy, with CAS removal causing the largest drop, confirming its central role.

Qualitative Analysis : Evaluation of six state‑of‑the‑art video generation models reveals distinct strengths and weaknesses per metric, guiding future model improvements.

LD‑RPS: Zero‑Shot Unified Image Restoration

LD‑RPS introduces a latent diffusion recurrent posterior sampling framework that restores images without any training data. It leverages multimodal large language models to generate semantic prompts from degraded inputs and employs a Feature‑Pixel Alignment Module (F‑PAM) to align intermediate diffusion states with the degraded image, enabling unsupervised, zero‑sample restoration.

Key Contributions

Zero‑shot multimodal image restoration using only the degraded image as condition.

Unsupervised Feature‑Pixel Alignment Module (F‑PAM) to bridge latent‑space and pixel‑space gaps.

Recurrent posterior sampling strategy for progressive quality improvement.

Experimental Evaluation

Low‑Light Enhancement : LD‑RPS achieves the best results among posterior‑sampling methods and matches top single‑task approaches on LO Lv1/Lv2 datasets.

Image Dehazing : On the RESIDE HSTS subset, LD‑RPS surpasses all zero‑shot methods in PSNR.

Image Denoising : LD‑RPS consistently outperforms baselines across all metrics.

Image Colorization : LD‑RPS produces vivid, high‑contrast colorized images, whereas baseline methods retain gray tones.

Conclusion and Outlook

VMBench provides a standardized, perception‑aligned evaluation framework for video motion generation, while LD‑RPS demonstrates a powerful zero‑shot approach for unified image restoration. Both contributions advance the field toward more human‑aligned generation and versatile, data‑free restoration techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

video generation benchmark diffusion models zero-shot image restoration perception-aligned metrics

Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.